Top Banner
Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6
48

Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

Jan 03, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

Data Warehouse Fundamentals

Rabie A. Ramadan, PhD

6

Page 2: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

2

ARCHITECTURAL COMPONENTS

Page 3: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

3

UNDERSTANDING DATA WAREHOUSE ARCHITECTURE

Objective • We will study the architectural components in the order in

which they enable the flow of data from the sources as business intelligence to the end-users

• We will be able to look at each area of the architecture and examine the functions, procedures, and features in that area.

Page 4: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

4

Architecture: Definitions The structure that brings all the components of a data warehouse

together is known as the architecture. Example:

• School building includes the various classrooms, offices, library, corridors, gymnasiums, doors, windows, roof, and a large number of other such components.

• The structure that ties all of the components together is the architecture of the school building.

• Let us say , when that the builders were told to make the classrooms large. So they made the classrooms larger but eliminated the offices altogether, thus constructing the school building with a faulty architecture

Correct architecture is critical for the success of your data warehouse.

Page 5: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

5

Architecture Factors Data Warehouse Architecture includes a number of factors:

1. The integrated data that is the centerpiece. The architecture includes everything that is needed to prepare the data and store it.

2. All the means for delivering information from your data warehouse.

3. Composed of the rules, procedures, and functions that enable your data warehouse to work and fulfill the business requirements.

4. Finally, the architecture is made up of the technology that empowers your data warehouse.

It defines the standards, measurements, general design, and support techniques.

Page 6: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

6

Architecture in Three Major Areas

Data acquisition

Data storage

Information delivery

Page 7: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

7

Architecture in Three Major Areas

Page 8: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

8

Different Objectives and Scope In Operational data , the user requires single piece of

information. • Information about single order

In data warehouse , the required data is large. • A view about the whole year sales divided into quarters

So, the scope is different

Defining the scope for a datawarehouse is also difficult

Page 9: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

9

What are all the factors you must consider for defining the scope?

You must consider the number and extent of the data sources. How many legacy systems are you going to extract the data

from? What are the external sources? Are you planning to include departmental files, spreadsheets,

and private databases? What about including the archived data? In a data warehouse, data granularity and data volumes are also

important considerations

Page 10: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

10

What is the scope of the election data warehouse ?

Page 11: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

11

Data Content The “read-only” data in the data warehouse sits is the primary

component in the architecture.

Operational data is not “read-only” data.

Data warehouse architecture must support the storing of data grouped by business subjects, not grouped by applications

Data warehouse does not represent a snapshot

Page 12: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

12

Complex Analysis and Quick Response Your data warehouse architecture must, therefore, support

variations for providing analysis.

Users must be able to drill down, roll up, slice and dice data, and play with “what-if” scenarios.

Users must have the capability to review the result sets in different output options.• Users are no longer content with textual result sets or results displayed in

tabular formats.

• Every result set in tabular format must be translated into graphical charts

Page 13: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

13

Complex Analysis and Quick Response Provision of strategic information is meant for making

rapid decisions and to deal with situations quickly.

Example, • Let us say your vice president of marketing wants to quickly

discover the reasons for the drop in sales for three consecutive weeks in the central region and make prompt decisions to remedy the situation.

• Your data warehouse must give him or her the tools and information for a quick response to the problem.

Page 14: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

14

Complex Analysis and Quick Response If your data warehouse supports real time information

retrieval, the architecture has to expand to accommodate real time data capture and the ability to obtain strategic information in real time to make on-the-spot decisions.

Real time data warehousing means delivery of information to a larger number of users both inside and outside the organization.

Page 15: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

15

ARCHITECTURAL FRAMEWORK Architecture Supporting Flow of Data

• Datawarehousing just means :

•Taking all the necessary source data,

• Preparing it,

•Storing it in suitable formats, and then

•Delivering useful information to the end-users.

Page 16: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

16

Page 17: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

17

Questions Need to be Answered

What happens at critical points of the flow of data?

What are the architectural components, and how do these components enable the data flow?

Page 18: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

18

At the Data Source Source data governs the

extraction of data for preparation and storage in the datawarehouse.

The data staging architectural component governs the transformation, cleansing, and integration of data.

Page 19: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

19

In the Data Warehouse Repository Includes the loading of data from

the staging area .

Storing the data in suitable formats for information delivery.

The metadata architectural component is also a storage mechanism to contain data about the data at every point of the flow of data from beginning to end.

Page 20: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

20

At the User End The information delivery

architectural component includes:

• Dependent data marts,

• Special multidimensional databases, and

• A full range of query and reporting facilities, including dashboards and scorecards.

Page 21: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

21

The Management and Control Module An overall module managing and

controlling the entire data warehouse environment.

This component has two major functions:

• First to constantly monitor all the ongoing operations, and

• Next to step in and recover from problems when things go wrong.

Page 22: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

22

Management Operations Relating to data acquisition:

• Extracting data from the source systems either for full refresh or for incremental loads.

• Moving the data into the staging area and performing the data

transformation.

• Manages and controls these data acquisition functions, ensuring that extracts and transformations are carried out correctly and in a timely fashion.

Page 23: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

23

Management Operations Relating to data storage:

• Manages backing up significant parts of the data warehouse and recovering from failures.

• Monitoring the growth and periodically archiving data from the data warehouse.

• Governs data security and provides authorized access to the data warehouse.

Page 24: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

24

Management Operations Relating to end-user

information delivery

• Ensures that information delivery is carried out properly.

Page 25: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

25

Architecture Factors Data Warehouse Architecture includes a number of factors:

1. The integrated data that is the centerpiece. The architecture includes everything that is needed to prepare the data and store it.

2. All the means for delivering information from your data warehouse.

3. Composed of the rules, procedures, and functions that enable your data warehouse to work and fulfill the business requirements.

4. Finally, the architecture is made up of the technology that empowers your data warehouse.

It defines the standards, measurements, general design, and support techniques.

Page 26: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

26

TECHNICAL ARCHITECTURE Technical architecture of a data warehouse is the complete set of

functions and services provided within its component structures.

It includes the procedures and rules that are required to perform the functions and provide the services.

It encompasses the data stores needed for each component to provide the services.

The architecture is not the set of tools needed to perform the functions and provide the services.

Tools are the means to implement the technical architecture.

Page 27: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

27

Technical architecture for Data Acquisition

Major architectural components are : • source data and data staging

Page 28: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

28

Technical architecture for Data Acquisition

Data Flow• Begins at the data sources and pauses at the

staging area.

• After transformation and integration, the data is ready for loading into the data warehouse repository.

Data Sources• Enterprise’s operational systems.

• May use an SQL-based language for extracting data.

• For including data from outside sources, you will have to create temporary files to hold the data received from the outside sources.

Page 29: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

29

Technical architecture for Data Acquisition

Intermediary Data Stores

• As data gets extracted from the data sources, it moves through temporary files.

• Sometimes, extracts of homogeneous data from several source applications are pulled into separate temporary files and then merged into another temporary file before moving it to the staging area.

Page 30: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

30

Technical architecture for Data Acquisition

Staging Area• All the extracted data is put together

and prepared for loading into the data warehouse.

• The staging area is like an assembly plant or a construction area.

• In this area, you examine each extracted file, review the business rules, perform the various data transformation functions, sort and merge data, resolve inconsistencies, and cleanse the data.

Page 31: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

31

Functions and Services of the Data Acquisition

Data Extraction• Select data sources and determine the types of filters to be applied to

individual sources.

• Generate automatic extract files from operational systems using replication

and other techniques.• Create intermediary files to store selected data to be merged later.

• Transport extracted files from multiple platforms.

• Provide automated job control services for creating extract files.

• Reformat input from outside sources.

• Reformat input from departmental data files, databases, and spreadsheets.

• Generate common application codes for data extraction.

• Resolve inconsistencies for common data elements from multiple sources.

Page 32: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

32

Functions and Services of the Data Acquisition

Data Transformation• Map input data to data for data warehouse repository.

• Clean data, deduplicate, and merge/purge.

• Denormalize extracted data structures as required by the dimensional model of the data warehouse.

• Convert data types.

• Calculate and derive attribute values.

• Check for referential integrity.

• Aggregate data as needed.

• Resolve missing values.

• Consolidate and integrate data.

Page 33: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

33

Functions and Services of the Data Acquisition

Data Staging• Provide backup and recovery for staging area repositories.

• Sort and merge files.

• Create files as input to make changes to dimension tables.

• If data staging storage is a relational database, create and populate database.

• Preserve audit trail to relate each data item in the data warehouse to input source.

• Resolve and create primary and foreign keys for load tables.

• Consolidate datasets and create flat files for loading through DBMS utilities.

• If staging area storage is a relational database, extract load files.

Page 34: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

34

Data Storage The process of loading the data from the staging area into the

data warehouse repository.

Page 35: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

35

Data Storage Data Flow

• The data flow begins at the data staging area.

• The transformed and integrated data is moved from the staging area to the data warehouse repository.

Page 36: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

36

Data Storage Data Groups

• The first group is the set of files or tables containing data for a full refresh.

• This group of data is usually meant for the initial loading of the data warehouse.

• The other group of data is the set of files or tables containing ongoing incremental loads.

• Most of these relate to nightly loads.

• Some incremental loads of dimension data may be performed at less frequent intervals.

Page 37: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

37

Data Storage The Data Repository

• Almost all of today’s data warehouse databases are relational databases.

• All the power, flexibility, and ease of use capabilities of the RDBMS become available for the processing of data.

Page 38: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

38

Data StorageFunctions and Services

Load data for full refreshes of data warehouse tables. Perform incremental loads at regular prescribed intervals. Support loading into multiple tables at the detailed and summarized

levels. Optimize the loading process. Provide automated job control services for loading the data

warehouse. Provide backup and recovery for the data warehouse database. Provide security. Monitor and fine-tune the database. Periodically archive data from the database according to preset

conditions.

Page 39: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

39

Technical Architecture Information Delivery

Page 40: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

40

Technical Architecture Information Delivery

Almost all modern data warehouses provide for online analytical processing (OLAP).

In this case, the primary data warehouse feeds data to proprietary multidimensional databases (MDDBs) where summarized data is kept as multidimensional cubes of information.

The users perform complex multidimensional analysis using the information cubes in the MDDBs.

Page 41: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

41

Technical Architecture Information Delivery

Data Flow• Recently progressive

organizations implement dashboards and scorecards as part of information delivery.

• Dashboards are real time or near real time information display devices.

• Data flows to the dashboards in real time from the data warehouse.

Page 42: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

42

Technical Architecture Information Delivery

Service Locations

• You may provide query services from the user desktop, from an application server, or from the database itself.

• This will be one of the critical decisions for your architecture design.

Page 43: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

43

Technical Architecture Information Delivery

Data Stores

• You may consider the following intermediary data stores:• Proprietary temporary stores to

hold results of individual queries and reports for repeated use

• Data stores for standard reporting

• Data stores for dashboards Proprietary multidimensional databases

Page 44: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

44

Technical Architecture Information Delivery

Functions and Services• Provide security to control information access.

• Monitor user access to improve service and for future enhancements.

• Allow users to browse data warehouse content.

• Simplify access by hiding internal complexities of data storage from users.

• Automatically reformat queries for optimal execution.

• Enable queries to be aware of aggregate tables for faster results.

• Govern queries and control runaway queries.

Page 45: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

45

ARCHITECTURAL TYPES Centralized Corporate Data Warehouse

Page 46: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

46

ARCHITECTURAL TYPES Independent Data Marts

• Data warehouse could be a combination of independent data marts

Page 47: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

47

ARCHITECTURAL TYPES Hub-and-Spoke

• Data marts depend on the enterprise data warehouse for data feed

Page 48: Data Warehouse Fundamentals Rabie A. Ramadan, PhD 6.

Assignment is posted

48