Top Banner
Data Warehousing: Data Data Warehousing: Data Models and OLAP Models and OLAP operations operations By By Kishore Jaladi Kishore Jaladi [email protected] [email protected]
41
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: olap

Data Warehousing: Data Data Warehousing: Data Models and OLAP Models and OLAP

operationsoperations

ByBy

Kishore JaladiKishore Jaladi

[email protected]@yahoo.com

Page 2: olap

Topics CoveredTopics Covered1. Understanding the term “Data Warehousing”1. Understanding the term “Data Warehousing”

2. Three-tier Decision Support Systems2. Three-tier Decision Support Systems

3. Approaches to OLAP servers3. Approaches to OLAP servers

4. Multi-dimensional data model4. Multi-dimensional data model

5. ROLAP5. ROLAP

6. MOLAP6. MOLAP

7. HOLAP7. HOLAP

8. Which to choose: Compare and Contrast8. Which to choose: Compare and Contrast

9. Conclusion9. Conclusion

Page 3: olap

Understanding the term Data Understanding the term Data WarehousingWarehousing• Data Warehouse: Data Warehouse:

The term Data Warehouse was coined by Bill Inmon in 1990, which he The term Data Warehouse was coined by Bill Inmon in 1990, which he defined in the following way: "A warehouse is a subject-oriented, defined in the following way: "A warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support integrated, time-variant and non-volatile collection of data in support of management's decision making process". He defined the terms in of management's decision making process". He defined the terms in the sentence as follows:the sentence as follows:

• Subject Oriented:Subject Oriented:Data that gives information about a particular subject instead of Data that gives information about a particular subject instead of about a company's ongoing operations. about a company's ongoing operations.

• Integrated: Integrated: Data that is gathered into the data warehouse from a variety of Data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole.sources and merged into a coherent whole.

• Time-variant: Time-variant: All data in the data warehouse is identified with a particular time All data in the data warehouse is identified with a particular time period. period.

• Non-volatileNon-volatileData is stable in a data warehouse. More data is added but data is Data is stable in a data warehouse. More data is added but data is never removed. This enables management to gain a consistent never removed. This enables management to gain a consistent picture of the business. picture of the business.

Page 4: olap

Data Warehouse Data Warehouse ArchitectureArchitecture

Page 5: olap

Other important Other important terminologyterminology

• Enterprise Data warehouseEnterprise Data warehousecollects all information about subjects collects all information about subjects ((customers,products,sales,assets, personnelcustomers,products,sales,assets, personnel) that span the ) that span the entire organizationentire organization

• Data MartData MartDepartmental subsets that focus on selected subjectsDepartmental subsets that focus on selected subjects

• Decision Support System (DSS)Decision Support System (DSS)Information technology to help the knowledge worker Information technology to help the knowledge worker (executive, manager, analyst) make faster & better decisions(executive, manager, analyst) make faster & better decisions

• Online Analytical Processing (OLAP)Online Analytical Processing (OLAP)an element of decision support systems (DSS)an element of decision support systems (DSS)

Page 6: olap

Three-Tier Decision Support Three-Tier Decision Support SystemsSystems

• Warehouse database serverWarehouse database server– Almost always a relational DBMS, rarely flat filesAlmost always a relational DBMS, rarely flat files

• OLAP serversOLAP servers– Relational OLAP (ROLAP): extended relational Relational OLAP (ROLAP): extended relational

DBMS that maps operations on multidimensional DBMS that maps operations on multidimensional data to standard relational operatorsdata to standard relational operators

– Multidimensional OLAP (MOLAP): special-purpose Multidimensional OLAP (MOLAP): special-purpose server that directly implements multidimensional server that directly implements multidimensional data and operationsdata and operations

• ClientsClients– Query and reporting toolsQuery and reporting tools– Analysis toolsAnalysis tools– Data mining toolsData mining tools

Page 7: olap

The Complete Decision Support The Complete Decision Support SystemSystem

Information Sources Data Warehouse Server(Tier 1)

OLAP Servers(Tier 2)

Clients(Tier 3)

OperationalDB’s

SemistructuredSources

extracttransformloadrefreshetc.

Data Marts

DataWarehouse

e.g., MOLAP

e.g., ROLAP

serve

OLAP

Query/Reporting

Data Mining

serve

serve

Page 8: olap

Approaches to OLAP Approaches to OLAP ServersServers

Three possibilities for OLAP serversThree possibilities for OLAP servers(1) Relational OLAP (ROLAP)(1) Relational OLAP (ROLAP)

– Relational and specialized relational DBMS to Relational and specialized relational DBMS to store and manage warehouse datastore and manage warehouse data

– OLAP middleware to support missing piecesOLAP middleware to support missing pieces(2) Multidimensional OLAP (MOLAP)(2) Multidimensional OLAP (MOLAP)

– Array-based storage structuresArray-based storage structures– Direct access to array data structuresDirect access to array data structures

(3) Hybrid OLAP (HOLAP)(3) Hybrid OLAP (HOLAP)

– Storing detailed data in RDBMSStoring detailed data in RDBMS– Storing aggregated data in MDBMSStoring aggregated data in MDBMS– User access via MOLAP toolsUser access via MOLAP tools

Page 9: olap

The Multi-Dimensional Data The Multi-Dimensional Data ModelModel

““Sales by product line over the past six months”Sales by product line over the past six months”

““Sales by store between 1990 and 1995”Sales by store between 1990 and 1995”

Prod Code Time Code Store Code Sales Qty

Store Info

Product Info

Time Info

. . .

Numerical MeasuresKey columns joining fact table

to dimension tables

Fact table for measures

Dimension tables

Page 10: olap

ROLAP: Dimensional Modeling Using ROLAP: Dimensional Modeling Using Relational DBMSRelational DBMS

• Special schema design: Special schema design: star, snowflakestar, snowflake

• Special indexes: bitmap, multi-table joinSpecial indexes: bitmap, multi-table join

• Proven technology (relational model, DBMS), Proven technology (relational model, DBMS), tend to outperform specialized MDDB tend to outperform specialized MDDB especially on large data setsespecially on large data sets

• ProductsProducts– IBM DB2, Oracle, Sybase IQ, RedBrick, IBM DB2, Oracle, Sybase IQ, RedBrick,

InformixInformix

Page 11: olap

Star Schema (in RDBMS)Star Schema (in RDBMS)

Page 12: olap

Star Schema ExampleStar Schema Example

Page 13: olap

The “Classic” Star SchemaThe “Classic” Star Schema

A single fact table, with A single fact table, with detail and summary datadetail and summary data

Fact table primary key Fact table primary key has only one key column has only one key column per dimensionper dimension

Each key is generatedEach key is generated Each dimension is a Each dimension is a

single table, highly de-single table, highly de-normalizednormalized

Benefits: Easy to understand, easy to define hierarchies, reduces # of physical joins, low maintenance, very simple metadata

PERIOD KEY

Store Dimension Time Dimension

Product Dimension

STORE KEYPRODUCT KEYPERIOD KEY

DollarsUnitsPrice

Period DescYearQuarterMonthDayCurrent FlagResolutionSequence

Fact Table

PRODUCT KEY

Store DescriptionCityStateDistrict IDDistrict Desc.Region_IDRegion Desc.Regional Mgr.Level

Product Desc.BrandColorSizeManufacturerLevel

STORE KEY

Page 14: olap

Star Schema with Sample Data

Page 15: olap

The “Snowflake” SchemaThe “Snowflake” Schema

STORE KEY

Store Dimension

Store DescriptionCityStateDistrict IDRegion_IDRegional Mgr.

District_IDDistrict Desc.Region_ID

Region_ID

Region Desc.Regional Mgr.

STORE KEYPRODUCT KEYPERIOD KEY

DollarsUnitsPrice

Store Fact Table

Page 16: olap

Aggregation in a Single Fact TableAggregation in a Single Fact Table

Drawbacks: Summary data in the fact table yields poorer performance for summary levels, huge dimension tables a problem

PERIOD KEY

Store Dimension Time Dimension

Product Dimension

STORE KEYPRODUCT KEYPERIOD KEY

DollarsUnitsPrice

Period DescYearQuarterMonthDayCurrent FlagResolutionSequence

Fact Table

PRODUCT KEY

Store DescriptionCityStateDistrict IDDistrict Desc.Region_IDRegion Desc.Regional Mgr.Level

Product Desc.BrandColorSizeManufacturerLevel

STORE KEY

Page 17: olap

PERIOD KEY

Store Dimension Time Dimension

Product Dimension

STORE KEYPRODUCT KEYPERIOD KEY

DollarsUnitsPrice

Period DescYearQuarterMonthDayCurrent FlagSequence

Fact Table

PRODUCT KEY

Store DescriptionCityStateDistrict IDDistrict Desc.Region_IDRegion Desc.Regional Mgr.

Product Desc.BrandColorSizeManufacturer

STORE KEY

The “Fact Constellation” The “Fact Constellation” SchemaSchema

DollarsUnitsPrice

District Fact Table

District_IDPRODUCT_KEYPERIOD_KEY

DollarsUnitsPrice

Region Fact Table

Region_IDPRODUCT_KEYPERIOD_KEY

Page 18: olap

TheThe

Aggregations using Aggregations using “Snowflake” Schema and “Snowflake” Schema and Multiple Fact TablesMultiple Fact Tables

• No No LEVELLEVEL in dimension tables in dimension tables

• Dimension tables are normalized by Dimension tables are normalized by decomposing at the attribute leveldecomposing at the attribute level

• Each dimension table has one key for Each dimension table has one key for each level of the dimensionís each level of the dimensionís hierarchy hierarchy

• The lowest level key joins the The lowest level key joins the dimension table to both the fact table dimension table to both the fact table and the lower level attribute tableand the lower level attribute table

How does it work? The best way is for the query to be built by understanding which summary levels exist, and finding the proper snowflaked attribute tables, constraining there for keys, then selecting from the fact table.

STORE KEY

Store Dimension

Store DescriptionCityStateDistrict IDDistrict Desc.Region_ IDRegion Desc.Regional Mgr.

District_ IDDistrict Desc.Region_ ID

Region_ ID

Region Desc.Regional Mgr.

STORE KEYPRODUCT KEYPERIOD KEY

DollarsUnitsPrice

Store Fact Table

DollarsUnitsPrice

District Fact Table

District_IDPRODUCT_KEYPERIOD_KEY Dollars

UnitsPrice

RegionFact Table

Region_IDPRODUCT_KEYPERIOD_KEY

Page 19: olap

Aggregation Contd …Aggregation Contd …

Advantage: Best performance when queries involve aggregation

Disadvantage: Complicated maintenance and metadata, explosion in the number of tables in the database

STORE KEY

Store Dimension

Store DescriptionCityStateDistrict IDDistrict Desc.Region_ IDRegion Desc.Regional Mgr.

District_ IDDistrict Desc.Region_ ID

Region_ ID

Region Desc.Regional Mgr.

STORE KEYPRODUCT KEYPERIOD KEY

DollarsUnitsPrice

Store Fact Table

DollarsUnitsPrice

District Fact Table

District_IDPRODUCT_KEYPERIOD_KEY Dollars

UnitsPrice

RegionFact Table

Region_IDPRODUCT_KEYPERIOD_KEY

Page 20: olap

AggregatesAggregates

sale prodId storeId date amtp1 s1 1 12p2 s1 1 11p1 s3 1 50p2 s2 1 8p1 s1 2 44p1 s2 2 4

Add up amounts for day 1 In SQL: SELECT sum(amt) FROM SALE WHERE date = 1

81

Page 21: olap

AggregatesAggregates Add up amounts by day In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date

ans date sum1 812 48

sale prodId storeId date amtp1 s1 1 12p2 s1 1 11p1 s3 1 50p2 s2 1 8p1 s1 2 44p1 s2 2 4

Page 22: olap

Another ExampleAnother Example Add up amounts by day, product In SQL: SELECT date, sum(amt) FROM SALE GROUP BY date, prodId

sale prodId date amtp1 1 62p2 1 19p1 2 48

drill-down

rollup

sale prodId storeId date amtp1 s1 1 12p2 s1 1 11p1 s3 1 50p2 s2 1 8p1 s1 2 44p1 s2 2 4

Page 23: olap

Points to be noticed about ROLAPPoints to be noticed about ROLAP

• Defines complex, multi-dimensional data with Defines complex, multi-dimensional data with simple modelsimple model

• Reduces the number of joins a query has to Reduces the number of joins a query has to processprocess

• Allows the data warehouse to evolve with rel. Allows the data warehouse to evolve with rel. low maintenancelow maintenance

• Can contain both detailed and summarized Can contain both detailed and summarized data.data.

• ROLAP is based on familiar, proven, and ROLAP is based on familiar, proven, and already selected technologies.already selected technologies.

BUT!!!BUT!!!

• SQL for multi-dimensional manipulation of SQL for multi-dimensional manipulation of calculations.calculations.

Page 24: olap

MOLAP: Dimensional Modeling MOLAP: Dimensional Modeling Using the Multi Dimensional ModelUsing the Multi Dimensional Model

• MDDB: a special-purpose data modelMDDB: a special-purpose data model

• Facts stored in multi-dimensional Facts stored in multi-dimensional arraysarrays

• Dimensions used to index arrayDimensions used to index array

• Sometimes on top of relational DBSometimes on top of relational DB

• ProductsProducts– Pilot, Arbor Essbase, GentiaPilot, Arbor Essbase, Gentia

Page 25: olap

The MOLAP CubeThe MOLAP Cube

sale prodId storeId amtp1 s1 12p2 s1 11p1 s3 50p2 s2 8

s1 s2 s3p1 12 50p2 11 8

Fact table view: Multi-dimensional cube:

dimensions = 2

Page 26: olap

3-D Cube3-D Cube

dimensions = 3

Multi-dimensional cube:Fact table view:

sale prodId storeId date amtp1 s1 1 12p2 s1 1 11p1 s3 1 50p2 s2 1 8p1 s1 2 44p1 s2 2 4

day 2 s1 s2 s3p1 44 4p2 s1 s2 s3

p1 12 50p2 11 8

day 1

Page 27: olap

ExampleExample

Store

Product

Time

M T W Th F S S

Juice

Milk

Coke

Cream

Soap

Bread

NYSF

LA

10

34

56

32

12

56

56 units of bread sold in LA on M

Dimensions:Time, Product, Store

Attributes:Product (upc, price, …)Store ……

Hierarchies:Product Brand …Day Week QuarterStore Region Country

roll-up to week

roll-up to brand

roll-up to region

Page 28: olap

Cube Aggregation: Roll-upCube Aggregation: Roll-up

day 2 s1 s2 s3p1 44 4p2 s1 s2 s3

p1 12 50p2 11 8

day 1

s1 s2 s3p1 56 4 50p2 11 8

s1 s2 s3sum 67 12 50

sump1 110p2 19

129

. . .

drill-down

rollup

Example: computing sums

Page 29: olap

Cube Operators for Roll-upCube Operators for Roll-up

day 2 s1 s2 s3p1 44 4p2 s1 s2 s3

p1 12 50p2 11 8

day 1

s1 s2 s3p1 56 4 50p2 11 8

s1 s2 s3sum 67 12 50

sump1 110p2 19

129

. . .

sale(s1,*,*)

sale(*,*,*)sale(s2,p2,*)

Page 30: olap

s1 s2 s3 *p1 56 4 50 110p2 11 8 19* 67 12 50 129

Extended CubeExtended Cube

day 2 s1 s2 s3 *p1 44 4 48p2* 44 4 48s1 s2 s3 *

p1 12 50 62p2 11 8 19* 23 8 50 81

day 1

*

sale(*,p2,*)

Page 31: olap

Aggregation Using Aggregation Using HierarchiesHierarchies

region A region Bp1 56 54p2 11 8

store

region

country

(store s1 in Region A;stores s2, s3 in Region B)

day 2 s1 s2 s3p1 44 4p2 s1 s2 s3

p1 12 50p2 11 8

day 1

Page 32: olap

Points to be noticed about MOLAPPoints to be noticed about MOLAP

• Pre-calculating or pre-consolidating transactional data Pre-calculating or pre-consolidating transactional data improves speed. improves speed.

BUTBUTFully pre-consolidating incoming data, MDDs require an Fully pre-consolidating incoming data, MDDs require an enormous amount of overhead both in processing time and in enormous amount of overhead both in processing time and in storage. An input file of 200MB can easily expand to 5GBstorage. An input file of 200MB can easily expand to 5GB

MDDs are great candidates for the MDDs are great candidates for the <<50GB department data 50GB department data marts.marts.

• Rolling up and Drilling down through aggregate data.Rolling up and Drilling down through aggregate data.

• With MDDs, application design is essentially the definition of With MDDs, application design is essentially the definition of dimensions and calculation rules, while the RDBMS requires dimensions and calculation rules, while the RDBMS requires that the database schema be a star or snowflake.that the database schema be a star or snowflake.

Page 33: olap

Hybrid OLAP (HOLAP)Hybrid OLAP (HOLAP)

• HOLAP = Hybrid OLAP:HOLAP = Hybrid OLAP:

– Best of both worldsBest of both worlds

– Storing detailed data in RDBMSStoring detailed data in RDBMS

– Storing aggregated data in MDBMSStoring aggregated data in MDBMS

– User access via MOLAP toolsUser access via MOLAP tools

Page 34: olap

Multi-dimensional

accessMultidimensional

Viewer

RelationalViewer

ClientMDBMS Server

Multi-dimensional

data

SQL-Read

RDBMS Server

Userdata Meta data

Deriveddata

SQL-Reach Through

SQL-Read

Data Flow in HOLAPData Flow in HOLAP

Page 35: olap

When deciding which When deciding which technology to go for, consider:technology to go for, consider:

1) Performance: 1) Performance:

• How fast will the system appear to the end-user? How fast will the system appear to the end-user?

• MDD server vendors believe this is a key point in their favor. MDD server vendors believe this is a key point in their favor.

2) Data volume and scalability: 2) Data volume and scalability:

• While MDD servers can handle up to 50GB of storage, RDBMS While MDD servers can handle up to 50GB of storage, RDBMS servers can handle hundreds of gigabytes and terabytes. servers can handle hundreds of gigabytes and terabytes.

Page 36: olap

An experiment with Relational and the Multidimensional models on a data set

The analysis of the author’s example illustrates the following differences between the best Relational alternative and the Multidimensional approach.

* This may include the calculation of many other derived data without any additional I/O.

Reference: http://dimlab.usc.edu/csci599/Fall2002/paper/I2_P064.pdf

relationrelationalal

Multi-Multi-dimensiondimensionalal

ImprovemeImprovementnt

Disk space requirementDisk space requirement

(Gigabytes)(Gigabytes)1717 1010 1.71.7

Retrieve the corporate Retrieve the corporate measuresmeasures

Actual Vs Budget, by month Actual Vs Budget, by month (I/O’s)(I/O’s)

240240 11 240240

Calculation of Variance Calculation of Variance Budget/Actual for the whole Budget/Actual for the whole database (I/O time in hours)database (I/O time in hours)

237237 2*2* 110*110*

Page 37: olap

What-if analysisWhat-if analysisIFIF

A. You require write access A. You require write access B. Your data is under 50 GBB. Your data is under 50 GBC. Your timetable to implement is 60-90 daysC. Your timetable to implement is 60-90 daysD. Lowest level already aggregatedD. Lowest level already aggregatedE. Data access on aggregated levelE. Data access on aggregated levelF. You’re developing a general-purpose application for inventory movement or assets F. You’re developing a general-purpose application for inventory movement or assets managementmanagement

THENTHENConsider an Consider an MDD /MOLAP MDD /MOLAP solution for your data mart solution for your data mart

  IFIF

A. Your data is over 100 GBA. Your data is over 100 GBB. You have a "read-only" requirementB. You have a "read-only" requirementC. Historical data at the lowest level of granularityC. Historical data at the lowest level of granularityD. Detailed access, long-running queriesD. Detailed access, long-running queriesE. Data assigned to lowest level elementsE. Data assigned to lowest level elements

THENTHENConsider an Consider an RDBMS/ROLAPRDBMS/ROLAP solution for your data mart. solution for your data mart.

IFIFA. OLAP on aggregated and detailed dataA. OLAP on aggregated and detailed dataB. Different user groupsB. Different user groupsC. Ease of use and detailed dataC. Ease of use and detailed data

THENTHENConsider an Consider an HOLAP HOLAP for your data martfor your data mart

Page 38: olap

ExamplesExamples

• ROLAPROLAP– Telecommunication startup: call data records Telecommunication startup: call data records

(CDRs) (CDRs) – ECommerce SiteECommerce Site– Credit Card CompanyCredit Card Company

• MOLAPMOLAP– Analysis and budgeting in a financial departmentAnalysis and budgeting in a financial department– Sales analysisSales analysis

• HOLAPHOLAP– Sales department of a multi-national companySales department of a multi-national company– Banks and Financial Service ProvidersBanks and Financial Service Providers

Page 39: olap

Tools availableTools available• ROLAP:ROLAP:

– ORACLE 8iORACLE 8i– ORACLE Reports; ORACLE DiscovererORACLE Reports; ORACLE Discoverer– ORACLE Warehouse BuilderORACLE Warehouse Builder– Arbors Software’s EssbaseArbors Software’s Essbase

• MOLAP:MOLAP:– ORACLE Express ServerORACLE Express Server– ORACLE Express Clients (C/S and Web)ORACLE Express Clients (C/S and Web)– MicroStrategy’s DSS serverMicroStrategy’s DSS server– Platinum Technologies’ Plantinum InfoBeaconPlatinum Technologies’ Plantinum InfoBeacon

• HOLAP:HOLAP:– ORACLE 8iORACLE 8i– ORACLE Express ServeORACLE Express Serve– ORACLE Relational Access ManagerORACLE Relational Access Manager– ORACLE Express Clients (C/S and Web)ORACLE Express Clients (C/S and Web)

Page 40: olap

ConclusionConclusion• ROLAP: RDBMS -> star/snowflake schemaROLAP: RDBMS -> star/snowflake schema

• MOLAP: MDD -> Cube structuresMOLAP: MDD -> Cube structures

• ROLAP or MOLAP: Data models used play major role in performance ROLAP or MOLAP: Data models used play major role in performance differencesdifferences

• MOLAP: for summarized and relatively lesser volumes of data (10-MOLAP: for summarized and relatively lesser volumes of data (10-50GB)50GB)

• ROLAP: for detailed and larger volumes of dataROLAP: for detailed and larger volumes of data

• Both storage methods have strengths and weaknessesBoth storage methods have strengths and weaknesses

• The choice is requirement specific, though currently data The choice is requirement specific, though currently data warehouses are predominantly built using RDBMSs/ROLAP.warehouses are predominantly built using RDBMSs/ROLAP.

Page 41: olap

ReferencesReferences• http://dimlab.usc.edu/csci599/Fall2002/paper/I2_P064.pdf

– OLAP, Relational, and Multidimensional Database Systems, by George Colliat, Arbor Software Corporation

• http://www.donmeyer.com/art3.html– Data warehousing ServicesData warehousing Services, , Data Mining & Analysis, LLCData Mining & Analysis, LLC

• http://www.cs.man.ac.uk/~franconi/teaching/2001/CS636/CS636-olap.ppt– Data Warehouse Models and OLAP OperationsData Warehouse Models and OLAP Operations, by Enrico , by Enrico

FranconiFranconi

• http://www.promatis.com/mediacenter/papers- ROLAP, MOLAP, HOLAP: How to determine which to - ROLAP, MOLAP, HOLAP: How to determine which to technology is appropriate, by Holger Frietch, PROMATIS technology is appropriate, by Holger Frietch, PROMATIS CorporationCorporation