Top Banner
Chapter 2: DATA WAREHOUSING FUNDAMENTALS of DATABASE SYSTEMS, Fifth Edition 1 Fundamentals of Database Systems, Fifth Edition
38

Chapter 2: DATA WAREHOUSING

Feb 11, 2016

Download

Documents

calder

Chapter 2: DATA WAREHOUSING. FUNDAMENTALS of DATABASE SYSTEMS , Fifth Edition. Who are my customers and what products are they buying?. Which customers are most likely to go to the competition ? . Introduction. What product promotions have the biggest impact on revenue?. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chapter 2: DATA WAREHOUSING

Chapter 2:DATA WAREHOUSING

FUNDAMENTALS of DATABASE SYSTEMS, Fifth Edition

1Fundamentals of Database Systems, Fifth Edition

Page 2: Chapter 2: DATA WAREHOUSING

Introduction

Who are my customers and what products are they buying?

Which customers are most likely to go to the competition ?

What impact will new products/services

have on revenue and margins?

What product promotions have the biggest

impact on revenue?

2Fundamentals of Database Systems, Fifth EditionFundamentals of Database Systems, Fifth Edition

Page 3: Chapter 2: DATA WAREHOUSING

Introduction (cont.) There is a great need for tools that provide decision

makers with information to make decisions quickly and reliably based on historical data.

The above functionality is achieved by data warehousing

it characterized by subject-oriented, integrated, nonvolatile, time-variant collection of data in support of management's decisions.

Fundamentals of Database Systems, Fifth Edition 3

Page 4: Chapter 2: DATA WAREHOUSING

Introduction (cont.) online analytical processing (OLAP)

A term used to describe the analysis of complex data from the data warehouse.

and data mining. The process of knowledge discovery

4Fundamentals of Database Systems, Fifth EditionFundamentals of Database Systems, Fifth Edition

Page 5: Chapter 2: DATA WAREHOUSING

Characteristics of Data Warehouses- Subject oriented Organized around major subjects, such as product,

sales. Focusing on the modelling and analysis of data for

decision makers, not on daily operations or transaction processing.

Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision process.

Fundamentals of Database Systems, Fifth Edition 5

Page 6: Chapter 2: DATA WAREHOUSING

Characteristics of Data Warehouses- integrated Constructed by integrating multiple, heterogeneous

data sources.

Data cleaning and data integration techniques are applied.

Fundamentals of Database Systems, Fifth Edition 6

Page 7: Chapter 2: DATA WAREHOUSING

Characteristics of Data Warehouses- Time Variant Data warehouse data : provide information from a

historical perspective (e.g., past 5-10 years)

Every data in the data warehouse contains an element of time.

Fundamentals of Database Systems, Fifth Edition 7

Page 8: Chapter 2: DATA WAREHOUSING

Characteristics of Data Warehouses- Non Volatile Operational update of data doesn’t occur in the

data warehouse environment.

Doesn't require transaction processing, recovery, and concurrency control mechanism.

Require only two operations in data accessing Initial loading of data and quering.

Fundamentals of Database Systems, Fifth Edition 8

Page 9: Chapter 2: DATA WAREHOUSING

Data Warehouse vs. operational databases

DW Traditional DB Large amount of data from multiple sources that may include different DB models or files acquired from independent systems and platforms.

It is a transactional (relational, object-oriented ,network ,hierarchical)

Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing. Optimizes for retrieval.

Focusing on daily operations or transaction processing Optimizes for routine transaction processing

Provide information from a historical perspective (e.g., past 5-10 years).

Current value data.

It is nonvolatile. In traditional DB ,transactions are the agent of change to the database.

Supports DSS, Data Mining and OLAP. Supports OLTP.

Fundamentals of Database Systems, Fifth Edition 9

Page 10: Chapter 2: DATA WAREHOUSING

OLTP vs. OLAP

OLTP OLAP

User Clerk, IT Professional. Decision-makers, analysts.

Function Day to day operations. Decision support.

DB Design Application-oriented (E-R based)

Subject-oriented (Star, snowflake)

Data Current. Historical.

View Detailed. Summarized.

Access Read/write. Read Mostly.

# Records accessed

Tens. Millions.

#Users Thousands. Hundreds.

Db size 100 MB-GB. 100GB-TB.

Fundamentals of Database Systems, Fifth Edition 10

Page 11: Chapter 2: DATA WAREHOUSING

What is a Data Warehouse?A Practitioners Viewpoint“A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context.”

Barry Devlin, IBM Consultant

Fundamentals of Database Systems, Fifth Edition 11

Page 12: Chapter 2: DATA WAREHOUSING

What is a Data Warehouse?

Fundamentals of Database Systems, Fifth Edition 12

Data source in Chicago

Data source in New York

Data source in Taranto

CleanIntegrateTransformLoadRefresh

Data warehouse

Query and analysis

tools

client

client

Page 13: Chapter 2: DATA WAREHOUSING

3-D data cube

Fundamentals of Database Systems, Fifth Edition 13

Page 14: Chapter 2: DATA WAREHOUSING

Measures Dimension Produc

ts Dim

ensio

nQ4

Q3

Q2

Tim

e D

imen

sion

Apples

CherriesGrapes

Q1

Melons

Example of Querying a Cube

AveUnits

Sales Dollars

SalesUnits

Net Price

1000

Page 15: Chapter 2: DATA WAREHOUSING

From table and spreadsheet to data cubes A data warehouse is based on a multidimensional data

model which views data in the form of data cube.

A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions. Dimension tables contains descriptions about the

subject of the business. such as item (item_name, brand, type) or time (day,

week, month, quarter, year)

Fundamentals of Database Systems, Fifth Edition 15

Page 16: Chapter 2: DATA WAREHOUSING

From table and spreadsheet to data cubes (cont.)

Fact table contain a factual or quantitative data Fact table also contains measures (such as

dollars_sold) and keys to each of the related dimension tables.

Fundamentals of Database Systems, Fifth Edition 16

Page 17: Chapter 2: DATA WAREHOUSING

4-D Data cube

Fundamentals of Database Systems, Fifth Edition 17

Page 18: Chapter 2: DATA WAREHOUSING

Cube: a lattice of cuboids

Fundamentals of Database Systems, Fifth Edition 18

0-D (apex) cuboids

1-D cuboids

2-D cuboids

3-D cuboids

4-D (base) cuboids

Page 19: Chapter 2: DATA WAREHOUSING

Conceptual Modeling of Data Warehouses Modeling data warehouses: dimensions &

measures Star schema: a fact table in the middle connected

to a set of dimension tables.

Snowflake schema: a refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension table, forming a shape similar to snowflake.

Fundamentals of Database Systems, Fifth Edition 19

Page 20: Chapter 2: DATA WAREHOUSING

Conceptual Modeling of Data Warehouses (cont.) Fact constellations: multiple fact tables share

dimension tables, viewed as a collection of stars, therefore called galaxy schema or fact constellation

Fundamentals of Database Systems, Fifth Edition 20

Page 21: Chapter 2: DATA WAREHOUSING

Example of Star Schematime

Time_key

Day

Day_of_the_week

Month

Quarter

year

Fundamentals of Database Systems, Fifth Edition 21

branchbranch_key

Branch_name

Branch_type

itemitem_key

Item_name

brand

type

Supplier_type

locationlocation_key

street

city

State_or_province

country

Sales Fact Table

Time_key

Item_key

Branch_key

Location_key

Units_sold

Dollars_sold

Avg_sales

Measures

Page 22: Chapter 2: DATA WAREHOUSING

Example of Snowflake Schema

timeTime_key

Day

Day_of_the_week

Month

Quarter

year

Fundamentals of Database Systems, Fifth Edition 22

branchbranch_key

Branch_name

Branch_type

itemitem_key

Item_name

brand

type

Supplier_type

locationlocation_key

street

City_key

Sales Fact Table

Time_key

Item_key

Branch_key

Location_key

Units_sold

Dollars_sold

Avg_sales

Measures

SupplierSupplier_key

Supplier_type

cityCity_key

city

State_or_province

country

Page 23: Chapter 2: DATA WAREHOUSING

Example of Fact Constellationtime

Time_key

Day

Day_of_the_week

Month

Quarter

year

Fundamentals of Database Systems, Fifth Edition 23

branchbranch_key

Branch_name

Branch_type

itemitem_key

Item_name

brand

type

Supplier_type

locationlocation_key

street

City_key

Sales Fact Table

Time_key

Item_key

Branch_key

Location_key

Units_sold

Dollars_sold

Avg_sales

Measures

Shipping fact table

Time_key

Item_key

Shipper_key

From_location

To_location

Dollars_cost

Units_shipped

shipperShipper_key

Cshipper_name

Location_key

Shipper_type

Page 24: Chapter 2: DATA WAREHOUSING

Cube definition syntax in DMQL

Fundamentals of Database Systems, Fifth Edition 24

Page 25: Chapter 2: DATA WAREHOUSING

Defining star schema in DMQL

Fundamentals of Database Systems, Fifth Edition 25

Page 26: Chapter 2: DATA WAREHOUSING

Defining snowflake in DMQL

Fundamentals of Database Systems, Fifth Edition 26

Page 27: Chapter 2: DATA WAREHOUSING

Defining fact constellation in DMQL

Fundamentals of Database Systems, Fifth Edition 27

Page 28: Chapter 2: DATA WAREHOUSING

Measure of Data Cube: three categories Distributive: if the result derived by applying the

function to n aggregated values is the same as that derived by applying the function on all the data without portioning. E.g., count(), min()

Fundamentals of Database Systems, Fifth Edition 28

Page 29: Chapter 2: DATA WAREHOUSING

Measure of Data Cube: three categories (cont.) Algebraic: if it can be computed by an algebraic

function with M arguments ( where M is abounded integer), each of which is obtained by applying a distributive aggregated function E.g., avg()

Holistic: if there is no constant bound on the storage size needed to describe a sub aggregate Mode(), rank()

Fundamentals of Database Systems, Fifth Edition 29

Page 30: Chapter 2: DATA WAREHOUSING

Typical OLAP operations Roll up ( drill-up) summarize data.

By climbing up hierarchy

Drill down ( roll down): reverse of roll-up From higher level summary to lower level summary or

detailed data.

Slice and dice: project and select

Fundamentals of Database Systems, Fifth Edition 30

Page 31: Chapter 2: DATA WAREHOUSING

Typical OLAP operations (cont.) Pivot ( rotate)

Reorient the cub, visualization, 3D to series of 2D planes

Other operations: Drill across: involving ( across) more than one fact

table Drill through: through the bottom level of the cube

to its back-end relational tables (using sql)

Fundamentals of Database Systems, Fifth Edition 31

Page 32: Chapter 2: DATA WAREHOUSING

Design of Data Warehouse: A Business Analysis Framework Four views regarding the design of data warehouse

Top down view: allow selection of the relevant information necessary for the data warehouse

Data source view: exposes the information being captured, stored, and managed by operational systems

Data warehouse view: consists of the fact table and dimension table

Fundamentals of Database Systems, Fifth Edition 32

Page 33: Chapter 2: DATA WAREHOUSING

Design of Data Warehouse: A Business Analysis Framework (cont.)

Business query view: see perspectives of data in the warehouse from the view of end-user

Fundamentals of Database Systems, Fifth Edition 33

Page 34: Chapter 2: DATA WAREHOUSING

Data Warehouse Design Process Top-down, bottom-up approaches or combination

of both Top-down: starts with overall design and planning Bottom-up: starts with experiments and prototypes

From software engineering point of view Waterfall: structure and systematic analysis at each

step before proceeding to next.

Fundamentals of Database Systems, Fifth Edition 34

Page 35: Chapter 2: DATA WAREHOUSING

Data Warehouse Design Process (cont.)

Spiral : rapid generation of increasingly function systems, quick turn around.

Fundamentals of Database Systems, Fifth Edition 35

Page 36: Chapter 2: DATA WAREHOUSING

Data Warehouse Design Process (cont.) Typical data warehouse design process:

Choose a business process to model. E.g., orders, invoice, etc

Choose the grain (atomic level of data) of the business process

Choose the dimension that will apply to each fact table record

Choose measure that will populate each fact table record

Fundamentals of Database Systems, Fifth Edition 36

Page 37: Chapter 2: DATA WAREHOUSING

Three Data Warehouse Models Enterprise warehouse

Collect all of the organization about subjects spanning the entire organization

Data Mart: A subset of corporate- wide data that is of value to

specific group of users.

Virtual warehouse Set of views over operational databases

Fundamentals of Database Systems, Fifth Edition 37

Page 38: Chapter 2: DATA WAREHOUSING

Data Warehouse Back-End Tools and Utilities Data extraction

Data cleaning

Data transformation

Load

refresh

Fundamentals of Database Systems, Fifth Edition 38