Top Banner
Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University
21

Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University.

Dec 14, 2015

Download

Documents

Eliezer Routh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University.

Evaluating XML-Extended OLAP Queries Based on a Physical Algebra

Xuepeng Yin and Torben B. Pedersen

Department of Computer Science Aalborg University

Page 2: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University.

# 2

Problem• OLAP-systems are good for complex analysis queries

Easy-to-use Fast Business, science ...

• Problems with physical integration in existing OLAP systems Integrating new data requires (partial) cube rebuild => too slow

• Problems arise with dynamic data Stock quotes, competitors prices, disease info...

• Data will often be available in Extended Markup Language (XML) format

Weather data, map info, price lists, ……

Page 3: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University.

# 3

Solution

• Allows the use of external XML data as virtual dimensions

Decoration (extra info) Type information.

Selection Condition on XML data

Grouping Categories by XML data

Logicalfederation

OLAP

OLAP/XML query

OLAP query XML query

<?xml version=”1.0” ?>

<?xml version=”1.0” ?>

<?xml version=”1.0” ?>

<?xml version=”1.0” ?>

XML

• Goal: flexible access to XML data from OLAP systems

Page 4: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University.

# 4

Overview• Contributions• Architecture of the federation• Linking OLAP and XML• The federation query semantics

The logical algebra The physical algebra Conversion from logical to physical plans

• Plan execution• Query optimization

The query optimizer Execution of an optimized plan

• Performance• Conclusion

Page 5: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University.

# 5

Contributions of This Paper• Previous OLAP-XML federation efforts

A logical algebra A partial, straight-forward implementation

• Problems with previous work The logical algebra does not accurately reflect query execution tasks Query optimization is based on an abstract level Implementation is very limited

• Novelties of this paper A physical algebra and simplified query semantics Practical query optimization techniques A full-function, robust query engine Experiments with the query engine

Page 6: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University.

# 6

Architecture of the federation• OLAP and XML components• Auxiliary components• Query engine

Query analyzer Query optimizer Query evaluator

Page 7: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University.

# 7

Linking OLAP and XML• Links

Relation between a set of dimension values and a set of XML nodes

• Level expressions <level>/<link>/<XPath expression> specifies a concrete link usage Nation/Nlink/Population links nations to populations

NlinkTime Orders EC

Year

Quarter

Month

Customer

Order

Region

Nation

Supplier

Quantity

<Nations><Nation> <NationName>Denmark< / NationName >

<Population>5.3</ Population></ Nation>

</ Nations>

Man.

Brand

Part

Suppliers

Nlink={(DK, n1), (CN, n2), (UK, n3)}

Page 8: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University.

# 8

The Federation Query Semantics• The logical algebra

Decoration, Federation selection, Federation Generalized projection,

• The federation query language: SQLXM

SELECT SUM(Quantities),

Brand(Part), Nation/Nlink/Population

FROM TC WHERE

Nation/Nlink/Population<30

GROUP BY Brand(Part), Nation/Nlink/Population

)(]//),([ QSUMPNlNPartBrandFed

]30//[ PNlNFed

PNlN //

TCF

Fed

Fed

Page 9: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University.

# 9

The Physical Algebra • Includes data retrieval and manipulation operators• A physical plan models real execution tasks

i.e., when, where and how data is processed

• Nine physical operators Querying the OLAP component

Cube selection and generalized projection

Data transfer between components Fact-, dimension- and XML- transfer operators

Temporary data manipulations Decoration, federation selection and generalized projection

Inlining XML data Inlining

Page 10: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University.

# 10

Querying the OLAP Component• Cube selection

Has no references to XML data Performs selection over the OLAP cube Intuitively, a SQL SELECT statement

• Cube generalized projection Has no references to XML data Rolls up dimensions and aggregate specified measures at specified

levels Intuitively, a SQL SELECT statement with a GROUP BY clause

cube

cube

Page 11: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University.

# 11

Data Transfer Between Components• Fact-transfer

Transfers the OLAP fact data to the temporary component The temporary facts then can be decorated Intuitively, a SQL SELECT INTO statement

• Dimension-transfer Transfers dimension data to the temporary component Used when higher level dimension data is required in the temporary

component

• XML-transfer Transfers XML data to the temporary component Uses XPath expressions to identify XML nodes with decoration

values

Page 12: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University.

# 12

Temporary Data Manipulations• Decoration

Decorates the cube by adding a new dimension Intuitively, adds a table with dimension and decoration XML data SELECT * FROM t(supplier, nation) t1, t(nation, population) t2 WHERE t1.nation

=t2.nation

• Federation selection Performs selection over the cube in the temporary component Intuitively, a SQL selection over the temporary tables SELECT t1.* FROM tfact t1, t(supplier, population) t2 WHERE t1.supplier

=t2.supplier and population<30

• Federation generalized projection Rolls up and aggregates the cube in the temporary component Intuitively, a SQL selection with a GROUP BY clause SELECT SUM(Quantity), t2.population FROM tfact t1, t(supplier, population) t2

WHERE t1.supplier= t2.supplier GROUP BY t2.population

Fed

Fed

Page 13: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University.

# 13

Inlining XML Data• Denoted as• Comparing federated data in the temporary component is

expensive• Inlining refers to integrating XML data into the OLAP

selections• A resulting predicate

Only references dimension levels and constants Can be evaluated in the OLAP component

Nation Population

DK 5.3

CN 1264.5

UK 19.1

Nation/Nlink/Population<30

Nation=‘DK’ OR Nation=‘UK’

+

Page 14: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University.

# 14

From Logical to Physical Plans

PNlN //

TCF

]30//[ PNlNFed

)(]//),([ QSUMPNlNPBFed

PNlN // ],[ NS

],[ BP

extTC ,F

PNlN //

]30//[ PNlNFed

)(]//),([ QSUMPNlNPBFed

Page 15: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University.

# 15

Plan Execution

PNlN //

PNlN // ],[ NS

],[ BP

extTC ,F } {

Quantity ExtPrice Supplier Part Order Day

17 17954 S1 P3 11 2/12/96

28 29983 S2 P4 42 30/3/94

2 2388 S3 P3 4 8/12/96

26 26374 S4 P2 20 10/11/93}{ FR

Nation Population

DK 5.3

CN 1264.5

UK 19.1

Supplier Nation

S1 DK

S2 DK

S3 CN

S4 UK

,

},,{ ],[ NSRRRF

PNlN //

5.3DK

1264.5CN

19.1UK

PopulationNation

UK

CN

DK

DK

Nation

S3

S1

S2

S4

Supplier

19.1

1264.5

5.3

5.3

Population

S3

S1

S2

S4

Supplier

},,,{ ],[ RRRR NSF

]30//[ PNlNFed

19.1

1264.5

5.3

5.3

Population

S3

S1

S2

S4

Supplier

S4 10/11/9320P2

S2 30/3/9442P4

8/12/964P3S3

2/12/96P3 11

26374

29983

2388

26

28

2

17954

DayPart OrderExtPrice

S117

SupplierQuantity

19.1 10/11/9320P2

5.3 30/3/9442P4

2/12/96P3 11

26374

29983

26

28

17954

DayPart OrderExtPrice

5.317

PopulationQuantity

}{ ,,, ],[ RRRR NSF

],[ BP

Part Brand

P2 B2

P3 B3

P4 B4

},,,,{ ],[],[ BPNS RRRRRF

)(]//),([ QSUMPNlNPBFed

Quantity Population Brand

17 5.3 B3

28 5.3 B4

26 19.1 B2

}{ FR

]30//[ PNlNFed

)(]//),([ QSUMPNlNPBFed

Page 16: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University.

# 16

The Query Optimizer

Pl anRewri ti ng

Logi calPl an

Conversi on

Pl an SpacePruni ng

CostEsti mati on

I ni t i al pl an

Fi nalexecuti on

pl an

lljl PP ,,1

),(,),,( 11 ll ljljll PPPPPP

):,(,),:,( 111 lll ljljljlll TPPPTPPP

):,(,),:,( lnlnln llllll TPPPTPPP lmlmlm

P

PP

• Based on the Volcano optimizer

• Four phases optimization at one stage Logical equivalent plan enumeration One-to-one logical to physical conversion Estimating cost of physical plans: Cost-based plan space pruning

),,( 1 nrootplan ttMaxtt

Page 17: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University.

# 17

An Optimized Query Plan

TCF

)(]//),([ QSUMPNlNPBFed

])'30//[( PNlNFed

PNlN //

)(]),([ QSUMNPBFed

)(]//),([ QSUMPNlNPBFed

PNlN //

)(],[ QSUMNBCube

])'30//[( PNlNCube

]30//[ PNlNCube

PNlN //

extTC ,F

Page 18: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University.

# 18

Execution of the Optimized Plan )(]//),([ QSUMPNlNPBFed

PNlN //

)(],[ QSUMNBCube

])'30//[( PNlNCube

]30//[ PNlN

PNlN //

extTC ,F} {

Nation Population

DK 5.3

CN 1264.5

UK 19.1

PNlN //}{ R

)]30//[( PNlN}{ R

]''''[ UK Nation DKNation=Cube S4 10/11/9320P2

S2 30/3/9442P4

2/12/96P3 11

26374

29983

26

28

17954

DayPart OrderExtPrice

S117

SupplierQuantity

}{ R )(],[ QSUMNBCube

Quantity Nation Brand

17 DK B3

28 DK B4

26 UK B2}{ R

UK B2

DK B4

B3

26

28

Brand

DK17

NationQuantity

},{ RRF

PNlN //

UK B2

DK B4

B3

26

28

Brand

DK17

NationQuantity

5.3DK

1264.5CN

19.1UK

PopulationNation}{ , FRR

)(]//),([ QSUMPNlNPBFed

}{ FR

Quantity Population Brand

17 5.3 B3

28 5.3 B4

26 19.1 B2

Page 19: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University.

# 19

Performance

1

10

100

1000

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

Query Type

Eva

luat

ion

Tim

e (in

sec

onds

)

Federated Cached Integrated

1

10

100

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16Query type

Eva

luat

ion

Tim

e (in

sec

onds

)

• One experiment compared: a. Our federated solution b. Physical integration c. Federating cached XML

data

• Data 100M fact data based on

TPC-H benchmark 11MB and 2KB XML data

• Queries• Result:

Comparable to b for small amounts of data

Use c for large amounts of data

Page 20: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University.

# 20

Related Work• Generic data integration

Relational, XML, semi-structured, OO,… + combinations Do not consider OLAP DB properties such as automatic

aggregation, dimension hierarchies and correct aggregation

• OLAP-object federations Current solution offers much more general use of external data Current solution not restricted to rigid object schemas Current solution allows irregular data

• Previous OLAP-XML federation efforts A logical algebra A partial, straight-forward implementation

Page 21: Evaluating XML-Extended OLAP Queries Based on a Physical Algebra Xuepeng Yin and Torben B. Pedersen Department of Computer Science Aalborg University.

# 21

Conclusion• OLAP handles schema changes and dynamic data poorly• Solutions

Logical federation of OLAP and XML A physical algebra models actual execution tasks Optimized query evaluation

• Experiments suggest feasibility • Future work

More optimization techniques Advanced evaluation techniques Co-operative development with OLAP query tool vendor