Top Banner
Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onos Mar 3 rd 2006
50

Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Jan 19, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Online Analytical Processing (OLAP)

An Overview

Kian Win Ong, Nicola OnoseMar 3rd 2006

Page 2: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Overview• Motivation

• Multi-Dimensional Data Model

• Research Areas

• Optimizations– Materializing multiple aggregates simultaneously– Materialization strategy

Page 3: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Motivation• Aggregation, summarization and exploration• Of historical data• To help management make informed decisions

Page 4: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Different Goal• Aggregation, summarization and exploration• Of historical data• To help management make informed decisions

Product Branch Time Price

Coke (0.5 gallon) Convoy Street 2006-03-01 09:00:01 $1.00

Pepsi (0.5 gallon) UTC 2006-03-01 09:00:01 $1.03

Coke (1 gallon) UTC 2006-03-01 09:00:02 $1.50

Altoids Costa Verde 2006-03-01 09:01:33 $0.30

...

• Find the total sales for each product and month• Find the percentage change in the total monthly

sales for each product

Page 5: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Different Requirements

OLTP OLAP

Tasks Day to day operation High level decision support

Size of database Gigabytes Terabytes

Time span Recent, up-to-date Spanning over months / years

Size of working set Tens of records, accessed through primary keys

Consolidated data from multiple databases

Workload Structured / repetitive Ad-hoc, exploratory queries

Performance Transaction throughput Query latency

• OLTP – On-Line Transaction Processing• OLAP – On-Line Analytical Processing

Page 6: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Overview• Motivation

• Multi-Dimensional Data Model

• Research Areas

• Optimizations– Materializing multiple aggregates simultaneously– Materialization strategy

Page 7: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Query Language Extensions

• In the real world, data is stored in RDBs.

Page 8: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Query Language Extensions

• In the real world, data is stored in RDBs.

• How to express N-dimensional problems using 2D tables?

Page 9: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Query Language Extensions

• In the real world, data is stored in RDBs.

• How to express N-dimensional problems using 2D tables?

• Can we combine OLAP and SQL queries?

• Jim Gray et al:Data Cube: A Relational Aggregation Operator1997

Page 10: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Query Language Extensions

1.histograms

Problems with GROUP BY

SELECT sales, prod_name, population FROM sales_history GROUP BY Population(City, State) as population

Page 11: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Query Language Extensions

1.histograms

2.rollup/drilldown

Problems with GROUP BY

ProductCategory

Product Name

Month Sales Sales by Cat.,

by Name

Salesby Cat.

Drinks Coke Feb 30.3

Mar 93.9 124.2

Heineken Feb 34.8

Mar 123.8 158.6 282.8

non relational representation

Page 12: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Query Language Extensions

1.histograms

2.rollup/drilldown

Problems with GROUP BY

Product Category

Product Name

Month Sales Sales by Cat.,

by Name

Salesby Cat.

Drinks Coke Feb 30.3 124.2 282.8

Drinks Coke Mar 93.9 124.2 282.8

Drinks Heineken Feb 34.8 158.6 282.8

Drinks Heineken Mar 123.8 158.6 282.8

relational, but the rollup is huge

Page 13: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Query Language Extensions

1.histograms

2.rollup/drilldown

3.cross tabulations

Problems with GROUP BY

2-D aggregation is more compact and more natural:

Drinks Feb Mar Total

Coke 30.3 93.9 124.2

Heineken 34.8 123.8 158.6

Total 65.1 217.7 282.8

Page 14: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Query Language ExtensionsReducing the number of attributes

Product Category

Product Name

Month Sales

Drinks Coke Feb 30.3

Drinks Coke Mar 93.9

Drinks Coke ALL 124.2

Drinks Heineken Feb 34.8

Drinks Heineken Mar 123.8

Drinks Heineken ALL 158.6

Drinks ALL ALL 282.8

Drinks ALL Feb 65.1

Drinks ALL Mar 217.7

Page 15: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Query Language Extensions

• introduce a new value: “ALL”

Reducing the number of attributes

“ALL” = the set over which we aggregate

Drinks Feb Mar Total (ALL)

Coke 30.3 93.9 124.2

Heineken 34.8 123.8 158.6

Total (ALL) 65.1 217.7 282.8

Page 16: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Query Language Extensions

• GROUP BY (1D)

General approach

Sales by Product Name

Feb Mar

Coke 30.3 93.9

Heineken 34.8 123.8

SUM 65.1 217.7

Page 17: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Query Language Extensions

• GROUP BY (1D)

• Cross Tab (2D)

General approach

Drinks Feb Mar ALL

Coke 30.3 93.9 124.2

Heineken 34.8 123.8 158.6

ALL 65.1 217.7 282.8

Product Category

Product Name

Month Sales

Drinks Coke Feb 30.3

Drinks Coke Mar 93.9

Drinks Coke ALL 124.2

Drinks Heineken Feb 34.8

Drinks Heineken Mar 123.8

Drinks Heineken ALL 158.6

Drinks ALL Feb 65.1

Drinks ALL Mar 217.7

Drinks ALL ALL 282.8

the corresponding relation:

Page 18: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Query Language Extensions

• GROUP BY (1D)

• Cross Tab (2D)

• Cube (3D)

General approachProduct

CategoryProduct Name

Month Sales

Drinks Coke Feb 30.3

Drinks Coke Mar 93.9

Drinks Coke ALL 124.2

… … … …

Snacks Doritos Feb 123.8

Snacks Doritos Mar 158.6

Snacks Doritos ALL 65.1

… … … …

ALL ALL ALL 964.0

By cat.andmonth

By cat. andname (does it make sense?)

By monthand name

Page 19: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Query Language Extensions

• GROUP BY (1D)

• Cross Tab (2D)

• Cube (3D)

• Any hypercube can be represented as a relation!

General approach

Page 20: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Query Language Extensions

• a CUBE relation, with aggregation function f(.)(x1, x2, …, xn-1, xn, f() )

……………………………

(x1, xn-1, …, xn, ALL, f() )

……………………………

(x1, x2, …, ALL, xn, f() )

……………………………

• after ROLLUP , reduce to a linear # of tuples(x1, x2, …, xn-1, xn, f() )

…………………………………

(x1, xn-1, …, xn, ALL, f() )

…………………………………

(x1, x2, …, ALL, ALL, f() )

…………………………………

(ALL, ALL, …, ALL, ALL, f() )

General approach

Page 21: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Query Language ExtensionsThe new operators: CUBE, ROLLUP

SELECT prod_category, prod_name, month, SUM(sales) AS sales FROM sales_history GROUP BY CUBE prod_category, prod_name, month

Product Category

Product Name

Month Sales

Drinks Coke Feb 30.3

Drinks Coke Mar 93.9

Drinks Coke ALL 124.2

… … … …

Drinks ALL Feb 99.8

… … … …

ALL ALL ALL 964.0

Idea:Group by the CUBE list.Union the aggregates.Introduce the ALL values.

Page 22: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Query Language ExtensionsThe new operators: CUBE, ROLLUP

SELECT prod_category, month, day, state, prod_name, SUM(sales) AS sales FROM sales_history GROUP BY prod_category ROLLUP month, day CUBE city, state

Product Category

Month Day State Product Name

Sales

Drinks Feb 26 CA Coke 12.3

Feb 26 CA Heineken 5.4

… … … … …

Feb 26 CA ALL 30.4

Feb 26 ALL Coke …

… … … …

Snacks Feb 26 CA Doritos 12.0

… … … …

Page 23: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Overview• Motivation

• Multi-Dimensional Data Model

• Research Areas

• Optimizations– Materializing multiple aggregates simultaneously– Materialization strategy

Page 24: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Research Areas• SQL language extensions

• Server architecture

• Parallel processing

• Index structures

• Materialized views

Page 25: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Overview• Motivation

• Multi-Dimensional Data Model

• Research Areas

• Optimizations– Materializing multiple aggregates simultaneously– Materialization strategy

Page 26: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Simultaneous

AggregatesMulti-Dimensional

• Optimization to calculate multiple aggregates simultaneously

• Useful for materialization of aggregate views

• Y. Zhao, P. Deshpande, J. NaughtonAn Array-Based Algorithm for Simultaneous Multidimensional AggregatesSIGMOD 1997

Page 27: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Multiple Aggregates

Month /Product

Feb Mar Total

Altoids 36 131 167

Coke 37 138 175

Doritos 21 136 157

Heineken 44 110 154

Pepsi 31 122 153

Pringles 37 126 164

Total 206 764 970

Product City Month Sales

Coke San Diego Feb 06 12

Pepsi Los Angeles Feb 06 13

Doritos San Diego Mar 06 72

Altoids San Diego Mar 06 65

...

Aggregate on…

Page 28: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Multiple Aggregates

City /Product

San Diego Los Angeles Total

Altoids 90 77 167

Coke 89 86 175

Doritos 74 83 157

Heineken 74 80 154

Pepsi 68 85 153

Pringles 73 90 164

Total 469 501 970

Month /City

Feb Mar Total

Los Angeles 112 358 469

San Diego 95 407 501

Total 206 764 970

Month /Product

Feb Mar Total

Altoids 36 131 167

Coke 37 138 175

Doritos 21 136 157

Heineken 44 110 154

Pepsi 31 122 153

Pringles 37 126 164

Total 206 764 970

Product City Month Sales

Coke San Diego Feb 06 12

Pepsi Los Angeles Feb 06 13

Doritos San Diego Mar 06 72

Altoids San Diego Mar 06 65

...

Aggregate on…

Page 29: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Multiple Aggregates

Product City Month Sales

Coke San Diego Feb 06 12

Pepsi Los Angeles Feb 06 13

Doritos San Diego Mar 06 72

Altoids San Diego Mar 06 65

...

1. Sales by Product / City2. Sales by Product / Month3. Sales by Month / City4. Sales by Product5. Sales by City6. Sales by Month7. Sales (Total)

Is it possible to• make a single pass over the transactional table?• calculate multiple aggregates simultaneously?

Aggregate on…

Page 30: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Chunking

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

64

20

36

42Dimension B

Dimension A

Dimension C

1

Product City Month Sales

Coke San Diego Feb 06 12

12

Array Chunk

Product

City

Month

Partition transactional data into array chunks

Page 31: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Naïve Algorithm

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

64

20

36

42

Dimension A

Dimension C

Pivot on ABaggregate on all C

Dimension A

Dimension B

Page 32: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Naïve Algorithm

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

64

20

36

42

Dimension A

Dimension C

Pivot on ABaggregate on all C

Pivot on ACaggregate on all B

Pivot on BCaggregate on all A

Dimension B

Page 33: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Single Pass Algorithm

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

64

20

36

42

Dimension A

Dimension C

B

1 2 3 4

1 2 3 4

1 2 3 4

AB

AC

BC

Make a single pass over data

Page 34: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Single Pass Algorithm

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

64

20

36

42

Dimension A

Dimension C

B

13

9 10 11 12

5 6 7 8

1 2 3 4

1 5 9 13

2 6 10 3 7 11 4 5 12

13

9 10 11 12

5 6 7 8

1 2 3 4

AB

AC

BC

Simultaneously maintain multiple aggregates

Page 35: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Single Pass Algorithm

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

64

20

36

42

Dimension A

Dimension C

B

13

9 10 11 12

5 6 7 8

1 2 3 4

1 5 9 13

2 6 10 3 7 11 4 5 12

13

9 10 11 12

5 6 7 8

1 2 3 4

AB

AC

BC

Write out completed aggregates

Page 36: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Single Pass Algorithm

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 16

64

20

36

42

Dimension A

Dimension C

B

13

9 10 11 12

5 6 7 8

1 2 3 4

1 5 9 13

2 6 10 3 7 11 4 5 12

13

AB

AC

BC

Only allocate memory that is necessary

Page 37: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Single Pass Algorithm13

9 10 11 12

5 6 7 8

1 2 3 4

1 5 9 13

2 6 10 3 7 11 4 5 12

13

AB

AC

BC

Array Chunk

ABC4 x 4 x 4

AB16 x 4 x 4

AC4 x 4 x 4

BC4 x 4

A4 x 4

B4

C4

all1

Minimum memory spanning tree

Page 38: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Multi Pass Algorithm

ABCD

ABC ABD ACD BCD

AB AC BC AD BD CD

A B C D

all

Recursively aggregate

Page 39: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Overview• Motivation

• Multi-Dimensional Data Model

• Research Areas

• Optimizations– Materializing multiple aggregates simultaneously– Materialization strategy

Page 40: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Implementing Data Cubes

• Biggest problem for data warehouses: the size

• Space / time trade-off:accelerate queries by materializing the cube

Page 41: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Implementing Data Cubes

• Biggest problem for data warehouses: the size

• Space / time trade-off:accelerate queries by materializing the cube

• The size of the relations gets even bigger!

Page 42: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Implementing Data Cubes

• Biggest problem for data warehouses: the size

• Space / time trade-off:accelerate queries by materializing the cube

• The size of the relations gets even bigger!

• M(ultidimensional)OLAP: good query performance, but bad scalability

• R(elational)OLAP: very scalable; query performance improved by materializing (partial) results

Page 43: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Implementing Data Cubes

• V. Harinarayan, A. Rajaraman, J.D. Ullman:Implementing Data Cubes EfficientlySIGMOD 1996

Presents a materialization strategy for the cells of the cube.

Page 44: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Implementing Data Cubes

Time Id

City Id

Product Id

Sales

Day

Month

WeekCity Id

City

State

Product Id

Name

Category

Week

Month

Year Year

Category Id

Category Name

Page 45: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Implementing Data Cubes

• casted as particular case of the rewriting using views problem

• what cells to materialize what SQL views to materialize

Page 46: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Implementing Data Cubes

• casted as particular case of the rewriting using views problem

• what cells to materialize what SQL views to materialize

p = productt = timec = city

• simple idea: Q1 depends on Q2 (Q1≤Q2) if Q1 can be fully answered using the results of Q2

ptc

pt tc pc

t p c

none

Page 47: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Implementing Data Cubes

• but cube dimensions are usually hierarchical

product_name

product_category

none

day

week month

year

none

city

state

none

X X

• direct-product latticep = productt = timec = city

ptc

pt tc pc

ptspwc

pyc

pmc

ps

pcatt

… ……

Page 48: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Implementing Data Cubes

• Def. cost of answering Q = # of rows in the table of ancestor(Q)

• It can be estimated w/o materializing the views

• Assume that all queries are identical to some view in the lattice

Page 49: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Implementing Data Cubes

• For a set S and a view vB(v,S) = ∑w≤v, (w not in S) max{cost(w)-cost(v), 0}

• Greedy algorithm for selecting k views to materialize from the lattice:

1. S := {top view}

2. For i=1 to k, add v to S s.t. B(v,S) is maximized

• The greedy algorithm is an (e-1)/e ≈ 0.63 approx. of the optimum.

Page 50: Online Analytical Processing (OLAP) An Overview Kian Win Ong, Nicola Onose Mar 3 rd 2006.

Discussion

• Questions from the audience…