Top Banner
01/16/22 Sudarshan 1 Review Today Star Schema Fact table Dimensions Drilling Down & Roll up Slicing & Dicing Implementation techniques for OLAP Bit map indexes Join indexes File org. Architecture Architecture Characteristics Characteristics Relational OLAP Relational OLAP Multidimensional OLAP Multidimensional OLAP ROLAP VS. MOLAP ROLAP VS. MOLAP
45
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Star Schema

04/08/23 Sudarshan 1

Review Today Star Schema

Fact table Dimensions Drilling Down &

Roll up Slicing & Dicing

Implementation techniques for OLAP Bit map indexes Join indexes File org.

ArchitectureArchitecture

CharacteristicsCharacteristics

Relational OLAPRelational OLAP

Multidimensional OLAPMultidimensional OLAP

ROLAP VS. MOLAPROLAP VS. MOLAP

Page 2: Star Schema

04/08/23 Sudarshan 2

Star Schema is a relational database schema for representing multidimensional data.

It is the simplest form of data warehouse schema that contains one or more dimensions and fact tables.

It is called a star schema because the entity-relationship diagram between dimensions and fact tables resembles a star where one fact table is connected to multiple dimensions.

The center of the star schema consists of a large fact table and it points towards the dimension tables.

The advantage of star schema are slicing down, performance increase and easy understanding of data.

What is Star Schema?

Page 3: Star Schema

04/08/23 Sudarshan 3

Steps in designing Star Schema Identify a business process for

analysis(like sales). Identify measures or facts (sales dollar). Identify dimensions for facts(product

dimension, location dimension, time dimension, organization dimension).

List the columns that describe each dimension.(region name, branch name, region name).

Determine the lowest level of summary in a fact table(sales dollar).

Page 4: Star Schema

04/08/23 Sudarshan 4

Important aspects of Star Schema & Snow Flake Schema In a star schema every dimension will have a

primary key. In a star schema, a dimension table will not have

any parent table. Whereas in a snow flake schema, a dimension

table will have one or more parent tables. Hierarchies for the dimensions are stored in the

dimensional table itself in star schema. Whereas hierarchies are broken into separate

tables in snow flake schema. These hierarchies helps to drill down the data from topmost hierarchies to the lowermost hierarchies.

Page 5: Star Schema

04/08/23 Sudarshan 5

Fact Facts are numeric measurements

(values) that represent a specific business activity.

Example, sales figures are numeric measurements that represent product and/or service sales.

Facts are used in business data analysis, are units, cost, prices and revenues.

Facts are stored in a FACT table I.e. the center of the star schema.

Page 6: Star Schema

04/08/23 Sudarshan 6

Fact Table

The centralized table in a star schema is called as FACT table, that contains facts and connected to dimensions. A fact table typically has two types of columns:

those that contain facts and those that are foreign keys to dimension tables. The primary key of a fact table is usually a composite

key that is made up of all of its foreign keys. A fact table might contain either detail level facts or

facts that have been aggregated (fact tables that contain aggregated facts are often instead called summary tables). A fact table usually contains facts with the same level of aggregation.

Page 7: Star Schema

04/08/23 Sudarshan 7

Many OLAP applications are based on a fact table

For example, a supermarket application might be based on a table

SalesSales (Market_Id, Product_Id, Time_Id, Sales_Amt)

The table can be viewed as multidimensional Market_Id, Product_Id, Time_Id are the dimensions

that represent specific supermarkets, products, and time intervals

Sales_Amt is a function of the other three

Page 8: Star Schema

04/08/23 Sudarshan 8

Fact Table (Conclusion)

Central table mostly raw numeric items narrow rows, a few columns at most large number of rows (millions to a

billion) Access via dimensions

Page 9: Star Schema

04/08/23 Sudarshan 9

Dimension

Qualifying characteristics that provide additional perspective to a given fact.

Example, sales might be compared by product from region to region and from one time period to the next.

Here sales have product, location and time dimensions.

Such dimensions are stored in DIMENSIONAL TABLE.

Page 10: Star Schema

04/08/23 Sudarshan 10

Dimension Tables The dimensions of the fact table are further

described with dimension tables Fact table:

SalesSales (Market_id, Product_Id, Time_Id, Sales_Amt)

Dimension Tables: MarketMarket (Market_Id, City, State, Region) ProductProduct (Product_Id, Name, Category, Price) TimeTime (Time_Id, Week, Month, Quarter)

Page 11: Star Schema

04/08/23 Sudarshan 11

Attributes Each dimension table contain

attributes. Used to search, filter and classify facts. Example, Sales, we can identify some

attributes for each dimension: Product Dimension: product ID,

description, product type Location Dimension: region, state, city. Time Dimension: year quarter, month,

week and date.

Page 12: Star Schema

04/08/23 Sudarshan 12

Attributes hierarchy AH provides a top-down data

organization Used for aggregation and

drill-down/roll-up data analysis. Example, location dimension attributes

can be organized in a hierarchy by region, state and city.

AH provides the capability to perform drill-down and roll-up searches.

Allows the DW and OLAP systems to to have defined path.

Page 13: Star Schema

04/08/23 Sudarshan 13

A Concept Hierarchy: Dimension (location)

all

Europe North_America

MexicoCanadaSpainGermany

Vancouver

M. WindL. Chan

...

......

... ...

...

all

region

office

country

TorontoFrankfurtcity

Page 14: Star Schema

04/08/23 Sudarshan 14

Multidimensional Data

Sales volume as a function of product, month, and region

Pro

duct

Regio

n

Month

Dimensions: Product, Location, TimeHierarchical summarization paths

Industry Region Year

Category Country Quarter

Product City Month Week

Office Day

Page 15: Star Schema

04/08/23 Sudarshan 15

A Sample Data Cube

Total annual salesof TV in U.S.A.Date

Produ

ct

Cou

ntr

ysum

sum TV

VCRPC

1Qtr 2Qtr 3Qtr 4Qtr

U.S.A

Canada

Mexico

sum

Page 16: Star Schema

04/08/23 Sudarshan 16

Star Schema A single fact table and for each dimension

one dimension table Does not capture hierarchies directly

T ime

prod

cust

city

fact

date, custno, prodno, cityname, ...

Page 17: Star Schema

04/08/23 Sudarshan 17

Example of Star Schema

                                                                  Example of Star Schema

                                                                                                               

Example of Star Schema: Figure 1.6

                                                                                                                                                                                                                                  

Page 18: Star Schema

04/08/23 Sudarshan 18

In the example, sales fact table is connected to dimensions location, product, time and organization. It shows that data can be sliced across all dimensions and again it is possible for the data to be aggregated across multiple dimensions. "Sales dollar" in sales fact table can be calculated across all dimensions independently or in a combined manner which is explained below.

Sales dollar value for a particular product Sales dollar value for a product in a location Sales dollar value for a product in a year within a location Sales dollar value for a product in a year within a location sold or serviced by an employee

Page 19: Star Schema

04/08/23 Sudarshan 19

Example of Star Schema

time_keydayday_of_the_weekmonthquarteryear

time

location_keystreetcityprovince_or_streetcountry

location

Sales Fact Table

time_key

item_key

branch_key

location_key

units_sold

dollars_sold

avg_sales

Measures

item_keyitem_namebrandtypesupplier_type

item

branch_keybranch_namebranch_type

branch

Page 20: Star Schema

04/08/23 Sudarshan 20

Aggregation Many OLAP queries involve aggregation of the data

in the fact table

For example, to find the total sales (over time) of each product in each market, we might use

SELECT S.Market_Id, S.Product_Id, SUM (S.Sales_Amt) FROM SalesSales S GROUP BY S.Market_Id, S.Product_Id

The aggregation is over the entire time dimension and thus produces a two-dimensional view of the data

Page 21: Star Schema

04/08/23 Sudarshan 21

Aggregation Over Time

The output of the previous query

SUM(Sales_Amt)

M1 M2 M3 M4

P1 3003 1503 …

P2 6003 2402 …

P3 4503 3 …

P4 7503 7000 …

P5 … … …

Market_Id

Product_Id

Page 22: Star Schema

04/08/23 Sudarshan 22

Typical OLAP Operations

Roll up (drill-up): summarize data by climbing up hierarchy or by dimension reduction

Drill down (roll down): reverse of roll-up from higher level summary to lower level summary or

detailed data, or introducing new dimensions Slice and dice:

project and select Pivot (rotate):

reorient the cube, visualization, 3D to series of 2D planes. Other operations

drill across: involving (across) more than one fact table drill through: through the bottom level of the cube to its

back-end relational tables (using SQL)

Page 23: Star Schema

04/08/23 Sudarshan 23

Drilling Down and Rolling Up

Some dimension tables form an aggregation hierarchy Market_Id City State Region

Executing a series of queries that moves down a hierarchy (e.g., from aggregation over regions to that over states) is called drilling down Requires the use of the fact table or

information more specific than the requested aggregation (e.g., cities)

Executing a series of queries that moves up the hierarchy (e.g., from states to regions) is called rolling up

Page 24: Star Schema

04/08/23 Sudarshan 24

Drilling down on market: from Region to StateSalesSales (Market_Id, Product_Id, Time_Id, Sales_Amt)

MarketMarket (Market_Id, City, State, Region)

1. SELECT S.Product_Id, M.Region, SUM

(S.Sales_Amt) FROM SalesSales S, MarketMarket M WHERE M.Market_Id = S.Market_Id GROUP BY S.Product_Id, M.Region

2. SELECT S.Product_Id, M.State, SUM (S.Sales_Amt) FROM SalesSales S, MarketMarket M WHERE M.Market_Id = S.Market_Id GROUP BY S.Product_Id, M.State,

Drilling Down

Page 25: Star Schema

04/08/23 Sudarshan 25

Rolling Up Rolling up on market, from State to Region

If we have already created a table, State_SalesState_Sales, using

1. SELECT S.Product_Id, M.State, SUM (S.Sales_Amt)

FROM Sales Sales S, MarketMarket M WHERE M.Market_Id = S.Market_Id GROUP BY S.Product_Id, M.State

then we can roll up from there to:

22. SELECT T.Product_Id, M.Region, SUM (T.Sales_Amt)

FROM State_SalesState_Sales T, MarketMarket M WHERE M.State = T.State GROUP BY T.Product_Id, M.Region

Page 26: Star Schema

04/08/23 Sudarshan 26

Roll-up and Drill Down

Sales Channel Region Country State Location Address Sales

Representative

Roll

Up

Higher Level ofAggregation

Low-levelDetails

Drill-D

ow

n

Page 27: Star Schema

04/08/23 Sudarshan 27

“Slicing and Dicing”

Product

Sales Channel

Regio

ns

Retail Direct Special

Household

Telecomm

Video

Audio IndiaFar East

Europe

The Telecomm Slice

Page 28: Star Schema

04/08/23 Sudarshan 28

Snowflake Schema A snowflake schema is a term that

describes a star schema structure normalized through the use of outrigger tables. i.e dimension table hierarchies are broken into simpler tables. In star schema example we had 4 dimensions like location, product, time, organization and a fact table (sales)

Page 29: Star Schema

04/08/23 Sudarshan 29

Snowflake schema Represent dimensional hierarchy

directly by normalizing tables. Easy to maintain and saves storage

T ime

prod

cust

city

fact

date, custno, prodno, cityname, ...

region

Page 30: Star Schema

04/08/23 Sudarshan 30

Example of Snowflake Schema

                                                                                                                                                                                       

                                         

Page 31: Star Schema

04/08/23 Sudarshan 31

Example of Snowflake Schema

time_keydayday_of_the_weekmonthquarteryear

time

location_keystreetcity_key

location

Sales Fact Table

time_key

item_key

branch_key

location_key

units_sold

dollars_sold

avg_sales

Measures

item_keyitem_namebrandtypesupplier_key

item

branch_keybranch_namebranch_type

branch

supplier_keysupplier_type

supplier

city_keycityprovince_or_streetcountry

city

Page 32: Star Schema

04/08/23 Sudarshan 32

Indexing Techniques Exploiting indexes to reduce

scanning of data is of crucial importance

Bitmap Indexes Join Indexes Other Issues

Text indexing Parallelizing and sequencing of

index builds and incremental updates

Page 33: Star Schema

04/08/23 Sudarshan 33

Indexing Techniques Bitmap index:

Index on a particular column

Each value in the column has a bit vector: bit-op is fast

The length of the bit vector: # of records in the base table

The i-th bit is set if the i-th row of the base table has the value for the indexed column

not suitable for high cardinality domains

Page 34: Star Schema

04/08/23 Sudarshan 34

BitMap Indexes Example: the attribute sex has values M and F.

A table of 100 million people needs 2 lists of 100 million bits

Page 35: Star Schema

04/08/23 Sudarshan 35

Customer Query : select * from customer wheregender = ‘F’ and vote = ‘Y’

0

0

0

0

0

0

0

0

0

1

1

1

1

1

1

1

1

1

Bitmap Index

M

F

F

F

F

M

Y

Y

Y

N

N

N

Page 36: Star Schema

04/08/23 Sudarshan 36

Bit Map Index

Cust Region RatingC1 N HC2 S MC3 W LC4 W HC5 S LC6 W LC7 N H

Base Base TableTable

Row ID N S E W1 1 0 0 02 0 1 0 03 0 0 0 14 0 0 0 15 0 1 0 06 0 0 0 17 1 0 0 0

Row ID H M L1 1 0 02 0 1 03 0 0 04 0 0 05 0 1 06 0 0 07 1 0 0

Rating IndexRating Index

Region Region IndexIndex

Customers whereCustomers where Region = WRegion = W Rating = MRating = MAndAnd

Page 37: Star Schema

04/08/23 Sudarshan 37

BitMap Indexes Comparison, join and aggregation operations

are reduced to bit arithmetic with dramatic improvement in processing time

Significant reduction in space and I/O (30:1) Adapted for higher cardinality domains as well. Compression (e.g., run-length encoding)

exploited Products that support bitmaps: Model 204,

TargetIndex (Redbrick), IQ (Sybase), Oracle 7.3

Page 38: Star Schema

04/08/23 Sudarshan 38

Join Indexes Pre-computed joins A join index between a fact table and a dimension

table correlates a dimension tuple with the fact tuples that have the same value on the common dimensional attribute e.g., a join index on city dimension of calls fact table correlates for each city the calls (in the calls table) from

that city

Page 39: Star Schema

04/08/23 Sudarshan 39

Join Indexes

Join indexes can also span multiple dimension tables e.g., a join index on city and time

dimension of calls fact table

Page 40: Star Schema

04/08/23 Sudarshan 40

Star Join Processing Use join indexes to join dimension and fact

table

CallsC+T

C+T+L

C+T+L+P

Time

Loca-tion

Plan

Page 41: Star Schema

04/08/23 Sudarshan 41

Bitmapped Join Processing

AND

Time

Loca-tion

Plan

Calls

Calls

Calls

Bitmaps101

001

110

Page 42: Star Schema

04/08/23 Sudarshan 42

Nigel Pendse, Richard Creath - The OLAP ReportNigel Pendse, Richard Creath - The OLAP Report

OLAP Is FASMI

Fast Analysis Shared Multidimensional Information

Page 43: Star Schema

04/08/23 Sudarshan 43

Warehouse Products Computer Associates -- CA-Ingres Hewlett-Packard -- Allbase/SQL Informix -- Informix, Informix XPS Microsoft -- SQL Server Oracle -- Oracle7, Oracle Parallel Server Red Brick -- Red Brick Warehouse SAS Institute -- SAS Software AG -- ADABAS Sybase -- SQL Server, IQ, MPP

Page 44: Star Schema

04/08/23 Sudarshan 44

Warehouse Server Products

Oracle 8 Informix

Online Dynamic Server XPS --Extended Parallel Server Universal Server for object relational

applications Sybase

Adaptive Server 11.5 Sybase MPP Sybase IQ

Page 45: Star Schema

04/08/23 Sudarshan 45

Warehouse Server Products

Red Brick Warehouse Tandem Nonstop IBM

DB2 MVS Universal Server DB2 400

Teradata