Top Banner
1 Data Warehouses BUAD/American University Data Warehouses
28

1 Data Warehouses BUAD/American University Data Warehouses.

Jan 02, 2016

Download

Documents

Frank Little
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Data Warehouses BUAD/American University Data Warehouses.

1Data WarehousesBUAD/American University

Data Warehouses

Page 2: 1 Data Warehouses BUAD/American University Data Warehouses.

2Data WarehousesBUAD/American University

Definition

• Data Warehouse: An integrated and consistent store of subject-oriented data that is obtained from a variety of sources and formatted into a meaningful context to support decision-making in an organization.

Page 3: 1 Data Warehouses BUAD/American University Data Warehouses.

3Data WarehousesBUAD/American University

Need forData Warehousing

• Integrated, company-wide view of high-quality information.

• Separation of operational and informational systems and data– operational system: a system that is used to run a

business in real time, based on current data– informational system: systems designed to support

decision making based on stable point-in-time or historical data

Page 4: 1 Data Warehouses BUAD/American University Data Warehouses.

4Data WarehousesBUAD/American University

Factors AllowingData Warehousing

• Relational DBMS.

• Advances in hardware: speed and storage capacity.

• End-user computing interfaces and tools.

Page 5: 1 Data Warehouses BUAD/American University Data Warehouses.

5Data WarehousesBUAD/American University

Data Warehouse Architectures

• Two-level

– source system files containing operational data

– transformed and integrated data warehouse

• Three-level

– Operational data.

– Enterprise data warehouse (EDW)- single source of data for decision making.

– Data marts - limited scope; data selected from EDW; customized decision-support for individual user groups

Page 6: 1 Data Warehouses BUAD/American University Data Warehouses.

6Data WarehousesBUAD/American University

Generic data warehouse architecture

Page 7: 1 Data Warehouses BUAD/American University Data Warehouses.

7Data WarehousesBUAD/American University

Three-layer architecture

Page 8: 1 Data Warehouses BUAD/American University Data Warehouses.

8Data WarehousesBUAD/American University

Reasons for theThree-Level Architecture

• EDW and data marts have different purposes and data architectures.

• Data transformation is complex and is best performed in two steps.

• Data marts customized decision support for different groups

• Architecture– Operational data, reconciled data, Derived data.

Page 9: 1 Data Warehouses BUAD/American University Data Warehouses.

9Data WarehousesBUAD/American University

Three-layer data architecture

Page 10: 1 Data Warehouses BUAD/American University Data Warehouses.

10Data WarehousesBUAD/American University

Data Characteristics

• Status vs. Event data.

– A transaction is a business activity that triggers one or more business events: event data captures them

• Transient vs. Periodic data.

– Transient: data in which changes to existing records are written over previous records, thus destroying previous data content

– periodic data: data that are never physically altered or deleted once added

Page 11: 1 Data Warehouses BUAD/American University Data Warehouses.

11Data WarehousesBUAD/American University

Example of DBMS log entry

Page 12: 1 Data Warehouses BUAD/American University Data Warehouses.

12Data WarehousesBUAD/American University

Transient operational data

Page 13: 1 Data Warehouses BUAD/American University Data Warehouses.

13Data WarehousesBUAD/American University

Reconciled DataCharacteristics

• Detailed

• Historical

• Normalized

• Enterprise-wide

• Quality controlled

Page 14: 1 Data Warehouses BUAD/American University Data Warehouses.

14Data WarehousesBUAD/American University

The Data Reconciliation Process

• Capture: capture the relevant data from source files to fill EDW– Static - initial load.– Incremental - ongoing update.

• Scrub or data cleansing– missing data, name reconciliation– Pattern recognition and other artificial

intelligence techniques.

Page 15: 1 Data Warehouses BUAD/American University Data Warehouses.

15Data WarehousesBUAD/American University

Steps in data reconciliation

Page 16: 1 Data Warehouses BUAD/American University Data Warehouses.

16Data WarehousesBUAD/American University

The Data Reconciliation Process

• Transform

– Convert the data format from the source to the target system.

– Record-Level Functions

• Selection.

• Joining.

• Aggregation (for data marts).

– Field-Level Functions

• Single-field transformation

• Multi-field transformation

Page 17: 1 Data Warehouses BUAD/American University Data Warehouses.

17Data WarehousesBUAD/American University

The Data Reconciliation Process

• Load and Index– Refresh Mode

• When the warehouse is first created.

• Static data capture.

– Update Mode• Ongoing update of the warehouse.

• Incremental data capture.

Page 18: 1 Data Warehouses BUAD/American University Data Warehouses.

18Data WarehousesBUAD/American University

Derived DataCharacteristics

• Type of data– Detailed, possibly periodic.– Aggregated.

• Distributed to departmental servers.

• Implemented in star schema.

Page 19: 1 Data Warehouses BUAD/American University Data Warehouses.

19Data WarehousesBUAD/American University

Star Schema

• Also called the dimensional model.

• Fact and dimension tables.– Fact table: consists of factual or quantitative

data about the business– Dimension table: hold descriptive data

• Grain of a fact table - time period for each record.

Page 20: 1 Data Warehouses BUAD/American University Data Warehouses.

20Data WarehousesBUAD/American University

Components of a star schema

Page 21: 1 Data Warehouses BUAD/American University Data Warehouses.

21Data WarehousesBUAD/American University

Star schema example

Page 22: 1 Data Warehouses BUAD/American University Data Warehouses.

22Data WarehousesBUAD/American University

Star schema with sample data

Page 23: 1 Data Warehouses BUAD/American University Data Warehouses.

23Data WarehousesBUAD/American University

Example of snowflake sample

Page 24: 1 Data Warehouses BUAD/American University Data Warehouses.

24Data WarehousesBUAD/American University

Size of the fact table

• Total number of stores: 1,000

• Total number of products: 10,000

• Total number of periods: 24

• Total rows: 1000 * 10,000 * 24 = 240,000,000

• On average 50% items record sales,– no of rows = 120,000,000

Page 25: 1 Data Warehouses BUAD/American University Data Warehouses.

25Data WarehousesBUAD/American University

Types of Data Marts

• Dependent - Populated from the EDW.

• Independent - Data taken directly from the operational databases.

Page 26: 1 Data Warehouses BUAD/American University Data Warehouses.

26Data WarehousesBUAD/American University

The User Interface

• The role of metadata.• Traditional query and reporting tools.• On-line analytical processing (OLAP)• The use of a set of graphical tools that provides

users with multidimensional views of their data and allows them to analyze the data using simple windowing techniques.

Page 27: 1 Data Warehouses BUAD/American University Data Warehouses.

27Data WarehousesBUAD/American University

The User Interface

– Slicing a cube.– Pivot

• Rotate the view for a particular data point to obtain another perspective.

• E.g. take a value from the units column and obtain by-store values.

– Drill-down

Page 28: 1 Data Warehouses BUAD/American University Data Warehouses.

28Data WarehousesBUAD/American University

Slicing a data cube