Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Online Analytical Processing (OLAP)

An Overview

Kian Win Ong, Nicola OnoseMar 3rd 2006

Overview• Motivation

• Multi-Dimensional Data Model

• Research Areas

• Optimizations– Materializing multiple aggregates simultaneously– Materialization strategy

Motivation• Aggregation, summarization and exploration• Of historical data• To help management make informed decisions

Different Goal• Aggregation, summarization and exploration• Of historical data• To help management make informed decisions

Product Branch Time Price

Coke (0.5 gallon) Convoy Street 2006-03-01 09:00:01 $1.00

Pepsi (0.5 gallon) UTC 2006-03-01 09:00:01 $1.03

Coke (1 gallon) UTC 2006-03-01 09:00:02 $1.50

Altoids Costa Verde 2006-03-01 09:01:33 $0.30

• Find the total sales for each product and month• Find the percentage change in the total monthly

sales for each product

Different Requirements

OLTP OLAP

Tasks Day to day operation High level decision support

Size of database Gigabytes Terabytes

Time span Recent, up-to-date Spanning over months / years

Size of working set Tens of records, accessed through primary keys

Consolidated data from multiple databases

Workload Structured / repetitive Ad-hoc, exploratory queries

Performance Transaction throughput Query latency

• OLTP – On-Line Transaction Processing• OLAP – On-Line Analytical Processing

• Research Areas

Query Language Extensions

• In the real world, data is stored in RDBs.

• How to express N-dimensional problems using 2D tables?

• Can we combine OLAP and SQL queries?

• Jim Gray et al:Data Cube: A Relational Aggregation Operator1997

1.histograms

Problems with GROUP BY

SELECT sales, prod_name, population FROM sales_history GROUP BY Population(City, State) as population

1.histograms

2.rollup/drilldown

ProductCategory

Product Name

Month Sales Sales by Cat.,

by Name

Salesby Cat.

Drinks Coke Feb 30.3

Mar 93.9 124.2

Heineken Feb 34.8

Mar 123.8 158.6 282.8

non relational representation

1.histograms

2.rollup/drilldown

Product Category

Product Name

Month Sales Sales by Cat.,

by Name

Salesby Cat.

Drinks Coke Feb 30.3 124.2 282.8

Drinks Coke Mar 93.9 124.2 282.8

Drinks Heineken Feb 34.8 158.6 282.8

Drinks Heineken Mar 123.8 158.6 282.8

relational, but the rollup is huge

1.histograms

2.rollup/drilldown

3.cross tabulations

2-D aggregation is more compact and more natural:

Drinks Feb Mar Total

Coke 30.3 93.9 124.2

Heineken 34.8 123.8 158.6

Total 65.1 217.7 282.8

Query Language ExtensionsReducing the number of attributes

Product Category

Product Name

Month Sales

Drinks Coke Mar 93.9

Drinks Coke ALL 124.2

Drinks Heineken Feb 34.8

Drinks Heineken Mar 123.8

Drinks Heineken ALL 158.6

Drinks ALL ALL 282.8

Drinks ALL Feb 65.1

Drinks ALL Mar 217.7

• introduce a new value: “ALL”

Reducing the number of attributes

“ALL” = the set over which we aggregate

Drinks Feb Mar Total (ALL)

Coke 30.3 93.9 124.2

Heineken 34.8 123.8 158.6

Total (ALL) 65.1 217.7 282.8

• GROUP BY (1D)

General approach

Sales by Product Name

Feb Mar

Coke 30.3 93.9

Heineken 34.8 123.8

SUM 65.1 217.7

• GROUP BY (1D)

• Cross Tab (2D)

General approach

Drinks Feb Mar ALL

Coke 30.3 93.9 124.2

Heineken 34.8 123.8 158.6

ALL 65.1 217.7 282.8

Product Category

Product Name

Month Sales

Drinks Heineken Feb 34.8

Drinks Heineken Mar 123.8

Drinks Heineken ALL 158.6

Drinks ALL Feb 65.1

Drinks ALL Mar 217.7

Drinks ALL ALL 282.8

the corresponding relation:

• GROUP BY (1D)

• Cross Tab (2D)

• Cube (3D)

General approachProduct

CategoryProduct Name

Month Sales

… … … …

Snacks Doritos Feb 123.8

Snacks Doritos Mar 158.6

Snacks Doritos ALL 65.1

… … … …

ALL ALL ALL 964.0

By cat.andmonth

By cat. andname (does it make sense?)

By monthand name

• GROUP BY (1D)

• Cross Tab (2D)

• Cube (3D)

• Any hypercube can be represented as a relation!

General approach

• a CUBE relation, with aggregation function f(.)(x1, x2, …, xn-1, xn, f() )

……………………………

(x1, xn-1, …, xn, ALL, f() )

……………………………

(x1, x2, …, ALL, xn, f() )

……………………………

• after ROLLUP , reduce to a linear # of tuples(x1, x2, …, xn-1, xn, f() )

…………………………………

(x1, xn-1, …, xn, ALL, f() )

…………………………………

(x1, x2, …, ALL, ALL, f() )

…………………………………

(ALL, ALL, …, ALL, ALL, f() )

General approach

Query Language ExtensionsThe new operators: CUBE, ROLLUP

SELECT prod_category, prod_name, month, SUM(sales) AS sales FROM sales_history GROUP BY CUBE prod_category, prod_name, month

Product Category

Product Name

Month Sales

… … … …

Drinks ALL Feb 99.8

… … … …

ALL ALL ALL 964.0

Idea:Group by the CUBE list.Union the aggregates.Introduce the ALL values.

Query Language ExtensionsThe new operators: CUBE, ROLLUP

SELECT prod_category, month, day, state, prod_name, SUM(sales) AS sales FROM sales_history GROUP BY prod_category ROLLUP month, day CUBE city, state

Product Category

Month Day State Product Name

Drinks Feb 26 CA Coke 12.3

Feb 26 CA Heineken 5.4

… … … … …

Feb 26 CA ALL 30.4

Feb 26 ALL Coke …

… … … …

Snacks Feb 26 CA Doritos 12.0

… … … …

• Research Areas

Research Areas• SQL language extensions

• Server architecture

• Parallel processing

• Index structures

• Materialized views

• Research Areas

Simultaneous

AggregatesMulti-Dimensional

• Optimization to calculate multiple aggregates simultaneously

• Useful for materialization of aggregate views

• Y. Zhao, P. Deshpande, J. NaughtonAn Array-Based Algorithm for Simultaneous Multidimensional AggregatesSIGMOD 1997

Multiple Aggregates

Month /Product

Feb Mar Total

Altoids 36 131 167

Coke 37 138 175

Doritos 21 136 157

Heineken 44 110 154

Pepsi 31 122 153

Pringles 37 126 164

Total 206 764 970

Product City Month Sales

Coke San Diego Feb 06 12

Pepsi Los Angeles Feb 06 13

Doritos San Diego Mar 06 72

Altoids San Diego Mar 06 65

Aggregate on…

Multiple Aggregates

City /Product

San Diego Los Angeles Total

Altoids 90 77 167

Coke 89 86 175

Doritos 74 83 157

Heineken 74 80 154

Pepsi 68 85 153

Pringles 73 90 164

Total 469 501 970

Month /City

Feb Mar Total

Los Angeles 112 358 469

San Diego 95 407 501

Total 206 764 970

Month /Product

Feb Mar Total

Altoids 36 131 167

Coke 37 138 175

Doritos 21 136 157

Heineken 44 110 154

Pepsi 31 122 153

Pringles 37 126 164

Total 206 764 970

Aggregate on…

Multiple Aggregates

1. Sales by Product / City2. Sales by Product / Month3. Sales by Month / City4. Sales by Product5. Sales by City6. Sales by Month7. Sales (Total)

Is it possible to• make a single pass over the transactional table?• calculate multiple aggregates simultaneously?

Aggregate on…

Chunking

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

42Dimension B

Dimension A

Dimension C

Array Chunk

Product

Partition transactional data into array chunks

Naïve Algorithm

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

Dimension A

Dimension C

Pivot on ABaggregate on all C

Dimension A

Dimension B

Naïve Algorithm

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

Dimension A

Dimension C

Pivot on ABaggregate on all C

Pivot on ACaggregate on all B

Pivot on BCaggregate on all A

Dimension B

Single Pass Algorithm

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

Dimension A

Dimension C

1 2 3 4

Make a single pass over data

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

Dimension A

Dimension C

9 10 11 12

5 6 7 8

1 2 3 4

1 5 9 13

2 6 10 3 7 11 4 5 12

9 10 11 12

5 6 7 8

1 2 3 4

Simultaneously maintain multiple aggregates

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

Dimension A

Dimension C

9 10 11 12

5 6 7 8

1 2 3 4

1 5 9 13

2 6 10 3 7 11 4 5 12

9 10 11 12

5 6 7 8

1 2 3 4

Write out completed aggregates

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

Dimension A

Dimension C

9 10 11 12

5 6 7 8

1 2 3 4

1 5 9 13

2 6 10 3 7 11 4 5 12

Only allocate memory that is necessary

Single Pass Algorithm13

9 10 11 12

5 6 7 8

1 2 3 4

1 5 9 13

2 6 10 3 7 11 4 5 12

Array Chunk

ABC4 x 4 x 4

AB16 x 4 x 4

AC4 x 4 x 4

BC4 x 4

A4 x 4

Minimum memory spanning tree

Multi Pass Algorithm

ABC ABD ACD BCD

AB AC BC AD BD CD

A B C D

Recursively aggregate

• Research Areas

Implementing Data Cubes

• Biggest problem for data warehouses: the size

• Space / time trade-off:accelerate queries by materializing the cube

• The size of the relations gets even bigger!

• M(ultidimensional)OLAP: good query performance, but bad scalability

• R(elational)OLAP: very scalable; query performance improved by materializing (partial) results

• V. Harinarayan, A. Rajaraman, J.D. Ullman:Implementing Data Cubes EfficientlySIGMOD 1996

Presents a materialization strategy for the cells of the cube.

Time Id

City Id

Product Id

WeekCity Id

Product Id

Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

d aggregation group

group bycould

history group

group bywhen

group byrelational

group byselect sales

group by2d aggregation

data cube

Documents

Ⅱ185 Ⅱ186 - Kian Ho

41 Onose, Deep oscillation - physiomed.de · deep...

Contenido Definiciones y Conceptos OLAP. Tipos de sistemas.....

melilit kian - repositori.ukdc.ac.id

Calificación Difusa en Cubos OLAP Fuzzy Qualified OLAP Cube

142028542 Topografie Dumitru Onose

Relational DBMS Support for OLAP(OLAP-3)

Kian Capital Gets Up To Speed - PEF Services · Kian...

Kian Kok Cert

OLAP - Universität Hildesheim€¦ · OLAP Server...

No Kian Tyres

OLAP mit SQL, OLAP mit MDX

Kian Seren

OLAP Systems and Multidimensional Queries IOLAP servers...

Kian Asher

19/20 - Kian Contract