DW e Modelo multidimensional (baseado nos slides do livro: Data Mining: C & T)

DW e Modelo DW e Modelo multidimensionalmultidimensional

(baseado nos slides do livro: Data (baseado nos slides do livro: Data Mining: C & T)Mining: C & T)

SAD Tagus 2004/05 H. Galhardas

What is a data warehouse?What is a data warehouse?

A multi-dimensional data modelA multi-dimensional data model

OLAP operationsOLAP operations

OutlineOutline


What is Data What is Data Warehouse?Warehouse?

Defined in many different ways, but not rigorously.Defined in many different ways, but not rigorously.

A decision support database that is maintained separately from

the organization’s operational database

Support information processing by providing a solid platform of

consolidated, historical data for analysis.

““A data warehouse is a A data warehouse is a subject-orientedsubject-oriented, , integratedintegrated, , time-time-

variantvariant, and , and nonvolatilenonvolatile collection of data in support of collection of data in support of

management’s decision-making process.”—W. H. Inmonmanagement’s decision-making process.”—W. H. Inmon

Data warehousingData warehousing::

The process of constructing and using data warehouses


Data Warehouse—Data Warehouse—Subject-OrientedSubject-Oriented

Organized around major subjects, such as Organized around major subjects, such as customer, customer,

product, salesproduct, sales..

Focusing on the modeling and analysis of data for Focusing on the modeling and analysis of data for

decision makers, not on daily operations or decision makers, not on daily operations or

transaction processing.transaction processing.

Provide Provide a simple and concisea simple and concise view around particular view around particular

subject issues by subject issues by excluding data that are not useful in excluding data that are not useful in

the decision support processthe decision support process..


Data Warehouse—Data Warehouse—IntegratedIntegrated

Constructed by Constructed by integratingintegrating multiple, heterogeneous multiple, heterogeneous data sourcesdata sources relational databases, flat files, on-line transaction records

Data cleaning and data integrationData cleaning and data integration techniques are techniques are applied.applied. Ensure consistency in naming conventions, encoding

structures, attribute measures, etc. among different data sources

E.g., Hotel price: currency, tax, breakfast covered, etc.

When data is moved to the warehouse, it is converted.


Data Warehouse—Time Data Warehouse—Time VariantVariant

The time horizon for the data warehouse is The time horizon for the data warehouse is

significantly longer than that of operational systems.significantly longer than that of operational systems. Operational database: current value data.

Data warehouse data: provide information from a historical

perspective (e.g., past 5-10 years)

Every key structure in the data warehouse contains Every key structure in the data warehouse contains

an an element of timeelement of time, explicitly or implicitly, explicitly or implicitly But the key of operational data may or may not contain

“time element”.


Data Warehouse—Non-Data Warehouse—Non-VolatileVolatile

A A physically separate storephysically separate store of data transformed of data transformed

from the operational environment.from the operational environment.

Operational Operational update of data does not occurupdate of data does not occur in the in the

data warehouse environment.data warehouse environment. Does not require transaction processing, recovery,

and concurrency control mechanisms

Requires only two operations in data accessing:

initial loading of data and access of data.


Data Warehouse vs. Data Warehouse vs. Heterogeneous DBMSHeterogeneous DBMS

Traditional heterogeneous DB integration: Traditional heterogeneous DB integration: Build wrappers/mediators on top of heterogeneous databases

Query driven approach

When a query is posed to a client site, a meta-dictionary is

used to translate the query into queries appropriate for

individual heterogeneous sites involved, and the results are

integrated into a global answer set

Complex information filtering, compete for resources

Data warehouse: Data warehouse: update-drivenupdate-driven, high performance, high performance Information from heterogeneous sources is integrated in advance

and stored in warehouses for direct query and analysis


Data Warehouse vs. Data Warehouse vs. Operational DBMSOperational DBMS

OLTP (on-line transaction processing)OLTP (on-line transaction processing) Major task of traditional relational DBMS Day-to-day operations: purchasing, inventory, banking,

manufacturing, payroll, registration, accounting, etc.

OLAP (on-line analytical processing)OLAP (on-line analytical processing) Major task of data warehouse system Data analysis and decision making

Distinct features (OLTP vs. OLAP):Distinct features (OLTP vs. OLAP): User and system orientation: customer vs. market Data contents: current, detailed vs. historical, consolidated Database design: ER + application vs. star + subject View: current, local vs. evolutionary, integrated Access patterns: update vs. read-only but complex queries


OLTP vs. OLAPOLTP vs. OLAP OLTP OLAP

users clerk, IT professional knowledge worker

function day to day operations decision support

DB design application-oriented subject-oriented

data current, up-to-date detailed, flat relational isolated

historical, summarized, multidimensional integrated, consolidated

usage repetitive ad-hoc

access read/write index/hash on prim. key

lots of scans

unit of work short, simple transaction complex query

# records accessed tens millions

#users thousands hundreds

DB size 100MB-GB 100GB-TB

metric transaction throughput query throughput, response


Why Separate Data Why Separate Data Warehouse?Warehouse?

High performance for both systemsHigh performance for both systems DBMS— tuned for OLTP: access methods, indexing, concurrency control,

recovery

Warehouse—tuned for OLAP: complex OLAP queries, multidimensional view,

consolidation

Different functions and different data:Different functions and different data: missing data: Decision support requires historical data which operational DBs do

not typically maintain

data consolidation: DS requires consolidation (aggregation, summarization) of

data from heterogeneous sources

data quality: different sources typically use inconsistent data representations,

codes and formats which have to be reconciled

Note: There are more and more systems which perform OLAP analysis Note: There are more and more systems which perform OLAP analysis

directly on relational databasesdirectly on relational databases





OutlineOutline


From Tables and From Tables and Spreadsheets to Data Spreadsheets to Data

CubesCubes A data warehouse is based on a A data warehouse is based on a multidimensional data multidimensional data

modelmodel which views data in the form of a data cube which views data in the form of a data cube

A A data cubedata cube, such as sales, allows data to be modeled , such as sales, allows data to be modeled

and viewed in multiple dimensionsand viewed in multiple dimensions

A data cube is defined by:A data cube is defined by: Dimension tables, such as item (item_name, brand, type), or

time(day, week, month, quarter, year)

Fact table contains (usually, numerical) measures or facts (such

as dollars_sold) and keys to each of the related dimension

tables


2-D view of sales data 2-D view of sales data according to dimensions according to dimensions

time and itemtime and item



time, item, locationtime, item, location


3-D data cube 3-D data cube representationrepresentation



time, item, location, time, item, location, suppliersupplier


Data CubeData Cube

In data warehousing literature, an n-D base cube is In data warehousing literature, an n-D base cube is

called a called a base cuboidbase cuboid. The top most 0-D cuboid, which . The top most 0-D cuboid, which

holds the highest-level of summarization, is called the holds the highest-level of summarization, is called the

apex cuboidapex cuboid. The lattice of cuboids forms a . The lattice of cuboids forms a data cube.data cube.

Any n-D data can be displayed as a series of Any n-D data can be displayed as a series of (n-1)-D(n-1)-D

cubescubes

Data cube is a logical representation; its physical Data cube is a logical representation; its physical

storage may be differentstorage may be different


Cube: A Lattice of Cube: A Lattice of CuboidsCuboids

time,item

time,item,location

time, item, location, supplier

all

time item location supplier

time,location

time,supplier

item,location

item,supplier

location,supplier

time,item,supplier

time,location,supplier

item,location,supplier

0-D(apex) cuboid

1-D cuboids

2-D cuboids

3-D cuboids

4-D(base) cuboid


Conceptual Modeling of Conceptual Modeling of DWDW

A multidimensional model can exist in the following A multidimensional model can exist in the following

forms:forms: Star schema: A fact table without redundancy in the middle

connected to a set of dimension tables

Snowflake schema: A refinement of star schema where

some dimensions are normalized therefore splitting the

data into additional tables. The resulting schema graph

forms a shape similar to snowflake

Fact constellations or galaxy schema: Multiple fact tables

share dimension tables, viewed as a collection of stars


Example of a Star SchemaExample of a Star Schema

time_keydayday_of_the_weekmonthquarteryear

time

location_keystreetcityprovince_or_streetcountry

location

Sales Fact Table

time_key

item_key

branch_key

location_key

units_sold

dollars_sold

avg_sales

Measures

item_keyitem_namebrandtypesupplier_type

item

branch_keybranch_namebranch_type

branch


Example of Snowflake Example of Snowflake SchemaSchema


time

location_keystreetcity_key

location

Sales Fact Table

time_key

item_key

branch_key

location_key

units_sold

dollars_sold

avg_sales

Measures

item_keyitem_namebrandtypesupplier_key

item


branch

supplier_keysupplier_type

supplier

city_keycityprovince_or_streetcountry

city


Example of Fact Example of Fact ConstellationConstellation


time

location_keystreetcityprovince_or_streetcountry

location

Sales Fact Table

time_key

item_key

branch_key

location_key

units_sold

dollars_sold

avg_sales

Measures

item_keyitem_namebrandtypesupplier_type

item


branch

Shipping Fact Table

time_key

item_key

shipper_key

from_location

to_location

dollars_cost

units_shipped

shipper_keyshipper_namelocation_keyshipper_type

shipper


Escolha das tabelas de Escolha das tabelas de factos e de dimensõesfactos e de dimensões

Análise das interrogaçõesAnálise das interrogaçõesatributos group-by indicam as dimensõesatributos agregados referem as medidas ou factosatributos where são os atributos da tabelas

de dimensões Exemplo :Exemplo :

select s s.time_key, s.location_key, sum(s.units_sold)from item i, sales swhere i.item_key=sales.item_key and i.item_name = ‘clothes’group by s.time_key, s.location_key


Categories of measures Categories of measures

DistributiveDistributive: if the result derived by applying the function to : if the result derived by applying the function to n n aggregate values is the same as that derived by applying aggregate values is the same as that derived by applying the function on all the data without partitioning.the function on all the data without partitioning.

Ex: count(), sum(), min(), max().

AlgebraicAlgebraic:: if it can be computed by an algebraic function with if it can be computed by an algebraic function with MM arguments (where arguments (where M M is a constant), each of which is is a constant), each of which is obtained by applying a distributive aggregate function.obtained by applying a distributive aggregate function.

Ex: avg(), min_N(), standard_deviation().

HolisticHolistic:: if there is no algebraic function with M arguments if there is no algebraic function with M arguments that characterizes the computation.that characterizes the computation.

Ex: median(), mode(), rank().


Hierarquia de conceitosHierarquia de conceitos

Define uma sequência de correspondências Define uma sequência de correspondências entre um conjunto de conceitos de baixo nível e entre um conjunto de conceitos de baixo nível e um conjunto de conceitos de alto nível.um conjunto de conceitos de alto nível.

Exemplos: dimensão localização e dimensão Exemplos: dimensão localização e dimensão tempotempo


Dimensão Dimensão locationlocation

all

Europe North_America

MexicoCanadaSpainGermany

Vancouver

M. WindL. Chan

...

......

... ...

...

all

region

office

country

TorontoFrankfurtcity


Dimensão Dimensão TimeTime

year

quarter

month

day

week


Multidimensional DataMultidimensional Data Sales volume as a function of product, Sales volume as a function of product,

month, and regionmonth, and region

Pro

duct

Regio

n

Month

Dimensions: Product, Location, TimeHierarchical summarization paths

Industry Region Year

Category Country Quarter

Product City Month Week

Office Day


A Star-Net Query A Star-Net Query ModelModel

Shipping Method

AIR-EXPRESS

TRUCKORDER

Customer Orders

CONTRACTS

Customer

Product

PRODUCT GROUP

PRODUCT LINE

PRODUCT ITEM

SALES PERSON

DISTRICT

DIVISION

OrganizationPromotion

CITY

COUNTRY

REGION

Location

DAILYQTRLYANNUALYTime

Each circle is called a footprint





OutlineOutline


OLAP queries (1)OLAP queries (1)

Pivoting:Pivoting: Aggregation on selected Aggregation on selected dimensions.dimensions.E.g., Pivoting on Location and

Time yields this cross-tabulation:

63 81 144

38 107 145

75 35 110

WI CA Total

1995

1996

1997

176 223 339Total

price

category

pname

pid country

statecitylocid

sales

locidtimeid

pid

holiday_flag

weekdate

timeid month

quarter

year

(Fact table)SALES

TIMES

PRODUCTSLOCATIONS


OLAP Queries (2)OLAP Queries (2)

Roll-up:Roll-up: Aggregating at different levels of Aggregating at different levels of a dimension hierarchy. a dimension hierarchy. Given total sales by city, we can roll-up to get

sales by state.


SQL Group By operatorSQL Group By operator

Grouping Values

Partitioned Table

Sum()

Aggregate Values

The GROUP BY relational operator partitions a table into groups. Each group is then aggregated by a function. The aggregation function summarizes some column of groups returning a value for each group.


SQL to express OLAP SQL to express OLAP queries queries

The cross-tabulation can be computed using a The cross-tabulation can be computed using a collection of SQL queries, e.g.:collection of SQL queries, e.g.:

SELECT SUM(S.sales)FROM Sales S, Times T, Locations LWHERE S.timeid=T.timeid AND S.timeid=L.timeidGROUP BY T.year, L.state

SELECT SUM(S.sales)FROM Sales S, Times TWHERE S.timeid=T.timeidGROUP BY T.year

SELECT SUM(S.sales)FROM Sales S, Location LWHERE S.timeid=L.timeidGROUP BY L.state


Problems with Group ByProblems with Group By

HistogramsHistogramsEx: SELECT day, nation, MAX(Temp)FROM WeatherGROUP BY Day(Time) AS day, Nation(Latitude, Longitude) AS nation;

Roll-up totals and sub-totals for drill-downsRoll-up totals and sub-totals for drill-downs Cross-tabulationsCross-tabulations


Roll-up totals and sub-Roll-up totals and sub-totals for drill-downs totals for drill-downs

SALES

Model Year Color Sales

Chevy 1990 red 5 Chevy 1990 white 87 Chevy 1990 blue 62 Chevy 1991 red 54 Chevy 1991 white 95 Chevy 1991 blue 49 Chevy 1992 red 31 Chevy 1992 white 54 Chevy 1992 blue 71 Ford 1990 red 64 Ford 1990 white 62 Ford 1990 blue 63 Ford 1991 red 52 Ford 1991 white 9 Ford 1991 blue 55 Ford 1992 red 27 Ford 1992 white 62 Ford 1992 blue 39


Sales Roll Up by Model by Year by Color

Sales Roll Up by Model by Year by Color

Model Year Color Sales by Modelby Year

by Color

Salesby Modelby Year

Salesby Model

Chevy

1994 black 50

white 40

90

1995 black 85

white 115

200

290


Sales Roll-Up by Model by Year by Color – relational

representation

Model Year Color Sales Salesby Modelby Year

Salesby Model

Chevy

1994 black 50 90 290

Chevy 1994 white 40 90 290

Chevy 1995 black 85 200 290

Chevy 1995 white 115 200 290


An Excel pivot table representation with Ford

sales data included

Sum Year Color

Sales 1994 1994 Total

1995 1995 Total

Grand Total

Model black white black white

Chevy 50 40 90 85 115 200 290

Ford 50 10 60 85 75 160 220

Grand Total 100 50 150 170 190 360 510


Sales Summary – relational representation of the pivot table

Model Year Color Units

Chevy 1994 black 50

Chevy 1994 white 40

Chevy 1994 ALL 90

Chevy 1995 black 85

Chevy 1995 white 115

Chevy 1995 ALL 200

Chevy ALL ALL 290


SQL code SQL code SELECT ‘ALL’, ‘ALL’, ‘ALL’, SUM(Sales)FROM SalesWHERE Model = 'Chevy'UNION SELECT Model, ‘ALL’, ‘ALL’, SUM(Sales)FROM SalesWHERE Model = 'Chevy'GROUP BY ModelUNION SELECT Model, Year, ‘ALL’, SUM(Sales)FROM SalesWHERE Model = 'Chevy'GROUP BY Model, YearUNIONSELECT Model, Year, Color, SUM(Sales)FROM SalesWHERE Model = 'Chevy'GROUP BY Model, Year, Color;


Cross tab – symmetric 2D aggregation

Table 6.a: Chevy Sales Cross Tab

Chevy 1994 1995 total (ALL)

black 50 85 135

white 40 115 155

total (ALL) 90 200 290


Problems with SQL Problems with SQL representationrepresentation

Expressing roll-ups and cross-tabs queries Expressing roll-ups and cross-tabs queries with SQL is dauntingwith SQL is daunting

Too complex to analyze for optimizationToo complex to analyze for optimization

Propose an extension of the relational Propose an extension of the relational group by – the group by – the cube operatorcube operator


Cube OperatorCube Operator

REDWHITEBLUE

By Color

By Make & Color

By Make & Year

By Color & Year

By MakeBy Year

Sum

The Data Cube and The Sub-Space AggregatesSum

REDWHITEBLUE

Chevy Ford

By Make

By ColorCross Tab

REDWHITEBLUE

By Color

Sum

Group By (with total)

Sum

Aggregate


Sales Sales SALES Model Year Color Sales

Chevy 1990 red 5 Chevy 1990 white 87 Chevy 1990 blue 62 Chevy 1991 red 54 Chevy 1991 white 95 Chevy 1991 blue 49 Chevy 1992 red 31 Chevy 1992 white 54 Chevy 1992 blue 71 Ford 1990 red 64 Ford 1990 white 62 Ford 1990 blue 63 Ford 1991 red 52 Ford 1991 white 9 Ford 1991 blue 55 Ford 1992 red 27 Ford 1992 white 62 Ford 1992 blue 39


Data cube for salesData cube for sales

Chevy 1990 blue 62

Chevy 1990 red 5 Chevy 1990 white 95

Chevy 1990 ALL 154 Chevy 1991 blue 49 Chevy 1991 red 54 Chevy 1991 white 95 Chevy 1991 ALL 198 Chevy 1992 blue 71 Chevy 1992 red 31 Chevy 1992 white 54 Chevy 1992 ALL 156 Chevy ALL blue 182

Model Year Color Sales

...


Semantics of the cube Semantics of the cube operatoroperator

Aggregates over all the <select list > attributes in the Aggregates over all the <select list > attributes in the Group By clause Group By clause

Unions in each super-aggregate of the global cube, Unions in each super-aggregate of the global cube, substituting ALL for the agg. columnssubstituting ALL for the agg. columns

Ex for Sales:Ex for Sales:select model, year, color, sum(sales) select model, year, color, sum(sales) as salesas salesfrom salesfrom saleswhere model in (‘Ford’,’Chevy’)where model in (‘Ford’,’Chevy’)

and year between 1990 and 1992and year between 1990 and 1992group by cube model, year, colorgroup by cube model, year, color


SQL to express OLAP SQL to express OLAP queries queries

The cross-tabulation can be computed using a The cross-tabulation can be computed using a collection of SQL queries, e.g.:collection of SQL queries, e.g.:

SELECT SUM(S.sales)FROM Sales S, Times T, Locations LWHERE S.timeid=T.timeid AND S.timeid=L.timeidGROUP BY T.year, L.state

SELECT SUM(S.sales)FROM Sales S, Times TWHERE S.timeid=T.timeidGROUP BY T.year

SELECT SUM(S.sales)FROM Sales S, Location LWHERE S.timeid=L.timeidGROUP BY L.state


The CUBE Operator The CUBE Operator (SQL:1999)(SQL:1999)

Generalizing the previous example, if there Generalizing the previous example, if there are k dimensions, we have 2^k possible SQL are k dimensions, we have 2^k possible SQL GROUP BY GROUP BY queries on a subset of dimensions.queries on a subset of dimensions.

CUBE pid, locid, timeid BY SUM SalesCUBE pid, locid, timeid BY SUM SalesEquivalent to rolling up Sales on all eight subsets

of the set {pid, locid, timeid}; each roll-up corresponds to an SQL query of the form:

SELECT SUM(S.sales)FROM Sales SGROUP BY grouping-list

Lots of work on optimizing the CUBE operator!


A Sample Data CubeA Sample Data CubeTotal annual salesof TV in U.S.A.Date

Produ

ct

Cou

ntr

ysum

sum TV

VCRPC

1Qtr 2Qtr 3Qtr 4Qtr

U.S.A

Canada

Mexico

sum


Cuboids Corresponding Cuboids Corresponding to the Cubeto the Cube

all

product date country

product,date product,country date, country

product, date, country

0-D(apex) cuboid

1-D cuboids

2-D cuboids

3-D(base) cuboid


Typical OLAP OperationsTypical OLAP Operations Roll up (drill-up):Roll up (drill-up): summarize data by climbing up hierarchy (or summarize data by climbing up hierarchy (or

by dimension reduction)by dimension reduction) Drill down (roll down):Drill down (roll down): reverse of roll-up; from higher level reverse of roll-up; from higher level

summary to lower level summary or detailed data, (or summary to lower level summary or detailed data, (or introducing new dimensions)introducing new dimensions)

Slice and dice:Slice and dice: slice performs a selection on one dimension of slice performs a selection on one dimension of the cube; dice defines a subcube by performing a selection on the cube; dice defines a subcube by performing a selection on two or more dimensionstwo or more dimensions

Pivot (rotate):Pivot (rotate): reorient the cube, visualization, 3D to series of reorient the cube, visualization, 3D to series of 2D planes2D planes

Drill across:Drill across: involving (across) more than one fact table involving (across) more than one fact table Drill through:Drill through: through the bottom level of the cube to its back- through the bottom level of the cube to its back-

end relational tables (using SQL)end relational tables (using SQL)


Roll-up vs Cube operatorRoll-up vs Cube operator

Fact tableFact table Sales (Market_id, Product_Id, Time_Id, Sales_Amt)

Dimension TablesDimension Tables Market (Market_Id, City, State, Region)

Product (Product_Id, Name, Category, Price)

Time (Time_Id, Week, Month, Quarter)


The fact and dimension relations can be The fact and dimension relations can be displayed in an E-R diagram, which displayed in an E-R diagram, which suggests a star and is called a suggests a star and is called a star star schemaschema

Star SchemaStar Schema


AggregationAggregation

Many OLAP queries involve Many OLAP queries involve aggregationaggregation of the data of the data in the fact tablein the fact table

For example, to find the total sales (over time) of each For example, to find the total sales (over time) of each product in each market, we might useproduct in each market, we might use SELECT S.Market_Id, S.Product_Id, SUM (S.Sales_Amt) FROM Sales S GROUP BY S.Market_Id, S.Product_Id

The aggregation is over the entire time dimension and The aggregation is over the entire time dimension and thus produces a two-dimensional view of the datathus produces a two-dimensional view of the data


Aggregation over TimeAggregation over Time The output of the previous queryThe output of the previous query

SUM(Sales_Amt)SUM(Sales_Amt) M1M1 M2M2 M3M3 M4M4

P1P1

P2P2

P3P3

P4P4

P5P5

Market_Id

Product_Id


Drilling Down and Drilling Down and Rolling UpRolling Up

Some dimension tables represent an Some dimension tables represent an aggregation aggregation hierarchyhierarchy Market_Id City State Region

When we execute a series of queries that moves When we execute a series of queries that moves down a hierarchy (down a hierarchy (e.g., e.g., from aggregation over from aggregation over regions to aggregation over states) we are said to regions to aggregation over states) we are said to be be drilling down.drilling down. Requires use of the fact table or information more

specific than the requested aggregation (e.g., cities) When we move up the hierarchy – from states to When we move up the hierarchy – from states to

regions – we are said to be regions – we are said to be rolling uprolling up Agregates can be calculated from a prior query


Drilling DownDrilling Down Drilling down on regionsDrilling down on regions

SELECT S.Product_Id, M.Region, SUM (S.Sales_Amt) FROM Sales S, Market M WHERE M.Market_Id = S.Market_Id GROUP BY M.Region, S.Product_Id

SELECT S.Product_Id, M.State, SUM (S.Sales_Amt) FROM Sales S, Market M WHERE M.Market_Id = S.Market_Id GROUP BY M.State, S.Product_Id

Sales (Market_Id, Product_Id, Time_Id, Sales_Amt)

Market (Market_Id, City, State, Region)


Rolling UpRolling Up Rolling up on regionsRolling up on regions

If we have already created a table, State_Sales, containing the result of

SELECT S.Product_Id, M.State, SUM (S.Sales_Amt) FROM Sales S, Market M WHERE M.Market_Id = S.Market_Id GROUP BY M.State, S.Product_Id

then we can roll up from there to:

SELECT T.Product_Id, M.Region, SUM (T.Sales_Amt)FROM State_Sales T, (SELECT DISTINCT M.Region, M.State

FROM Market M) AS R (Region, State)WHERE R.State = T.StateGROUP BY R.Region, T.Product_Id


PivotingPivoting When we view the data as a multi-dimensional When we view the data as a multi-dimensional

cube and group on a subset of the axes, we are cube and group on a subset of the axes, we are said to be performing a pivot on those axessaid to be performing a pivot on those axes Pivoting uses GROUP BY; aggregation is used on the

remaining attributes Example: Pivot on the product and time and aggregate

on the Market_Id

Sales (Market_id, Product_Id, Time_Id, Sales_Amt)

Time (Time_Id, Week, Month, Quarter)Time (Time_Id, Week, Month, Quarter) SELECT S.Product_Id, T.Quarter, SUM (Sales_Amt) FROM Sales S, Time T WHERE T.Time_Id = S.Time_Id GROUP BY T.Quarter, S.Product_Id


ROLLUPROLLUP

ROLLUPROLLUP is similar to is similar to CUBECUBE except that except that instead of aggregating all subsets of the instead of aggregating all subsets of the arguments, it creates subsets moving from arguments, it creates subsets moving from right to leftright to leftROLLUP is also in SQL:1999


Example of Example of ROLLUPROLLUP OperatorOperator

SELECT S.Market_Id, S.Product_Id, SUM (S.Sales_Amt)

FROM Sales SGROUP BY ROLLUP (S.Market_Id, S. Product_Id)

- first aggregates with the finest granularityGROUP BY S.Market_Id, S.Product_Id

- then with the next level of granularityGROUP BY S.Market_Id

- then the grand total is computed with the empty GROUP BY clause GROUP BY


ROLLUPROLLUP Vs. Vs. CUBECUBE

By contrast, the same query with By contrast, the same query with CUBECUBE - first aggregates with the finest granularity

GROUP BY S.Market_Id, S.Product_Id

- then with the next level of granularity (both subsets)

GROUP BY S.Market_IdGROUP BY S.Product_Id

- then the grand total with

GROUP BY


BibliografiaBibliografia

(Livro) (Livro) Data Mining: Concepts and TechniquesData Mining: Concepts and Techniques, J. Han & M. , J. Han & M. Kamber, Morgan Kaufmann, 2001 (Secções 2.1 e 2.2)Kamber, Morgan Kaufmann, 2001 (Secções 2.1 e 2.2)

(Livro) Database Management Systems, R. Ramakrishnan & (Livro) Database Management Systems, R. Ramakrishnan & J. Gehrke, 3rd Ed. (Cap. 25)J. Gehrke, 3rd Ed. (Cap. 25)

(Artigo) (Artigo) An Overview of Data Warehousing and OLAP An Overview of Data Warehousing and OLAP TechnologyTechnology, S. Chaudhuri & U. Dayal, SIGMOD Record, , S. Chaudhuri & U. Dayal, SIGMOD Record, March 1997March 1997

(Artigo) (Artigo) Data Cube: A Relational Aggregation Operator Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-TotalsGeneralizing Group-By, Cross-Tab, and Sub-Totals, J. Gray , J. Gray et al, Tech. Report MSR-TR-97-32, 1997et al, Tech. Report MSR-TR-97-32, 1997

DW e Modelo multidimensional (baseado nos slides do livro: Data Mining: C & T)

Documents

DW e Modelo multidimensional (baseado nos slides do livro: Data Mining: C & T)