Top Banner
1 Introduction to XML Algebra CS561
40

1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

Jan 15, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

1

Introduction to XML Algebra

CS561

Page 2: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

2

Data Model data model ~ core data structures and

data types supported by DBMS relational database is a table (set-oriented)

data model XML format is a tree-structured

hierarchical model

Page 3: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

3

Why Query Algebra (for XML) ?

It is common to translate a query language into an algebra.

First, the algebra is used to give a semantics for the query language.

Second, the algebra is used to support query optimization.

Page 4: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

5

NIAGARA Title : Following the paths of XML Data: An

algebraic framework for XML query evaluation

By : Leonidas Galanis, Efstratios Viglas, David J. DeWitt, Jeffrey. F. Naughton, and David Maier.

Univ. of Wisconsin

Page 5: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

6

Outline

Concepts of Niagara Algebra

Operations

Optimization

Page 6: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

7

Goals of Niagara Algebra

Be independent of schema information Query on both structure and content Generate simple, flexible, yet powerful

algebraic expressions Allow re-use of traditional optimization

techniques

Page 7: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

8

Example: XML Source Documents

Invoice.xml

<Invoice_Document>

<invoice No = 1>

<account_number>2 </account_number>

<carrier>AT&T</carrier>

<total>$0.25</total>

</invoice>

<invoice>

<account_number>1 </account_number>

<carrier>Sprint</carrier>

<total>$1.20</total>

</invoice>

<invoice>

<account_number>1 </account_number>

<carrier>AT&T</carrier>

<total>$0.75</total>

</invoice>

</Invoice_Document>

Customer.xml

<Customer_Document>

<customer>

<account>1 </account>

<name>Tom </name>

</customer >

<customer>

<account>2 </account>

<name>George </name>

</customer >

</Customer _Document>

Page 8: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

9

XML Data Model and Tree GraphExample:

Invoice_Document

Invoice Invoice…

numbercarrier total number

carriertotal

2 AT&T $0.25 1 Sprint $1.20

<Invoice_Document> <invoice> <number>2</number> <carrier>Sprint</carrier> <total>$0.25</total> </invoice>

<invoice><number>1</number> <carrier>Sprint</carrier> <total>$1.20</total> </invoice>

</Invoice_Document>

Ordered Tree Graph,

Semi structured Data

Page 9: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

10

XML Data Model (for Querying)

SQL: relations in, relation out. Relational Algebra: relations in, relation out.

XQuery: XML doc in, XML docs out XML Algebra: ??

Page 10: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

11

XML Data Model [GVDNM01]

Collection of bags of vertices. Vertices in a bag have no order. Example:

Root invoice.xml invoice invoice.account_number

<invoice>Invoice-element-content

</invoice>

< account_number >element-content

</ account_number >

[Root“invoice.xml”, invoice, invoice. account_number ]

Page 11: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

12

Data Model

Bag elements are reachable by path expressions.

Path expression consists of two parts:An entry pointA relative forward part

Example: account_number:invoice

Page 12: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

13

Outline

Concepts of Niagara Algebra

Operations

Optimization

Page 13: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

14

Operators

Source S , Follow , Expose , Vertex ,

Source S , Select , Join , Rename ,

Group , Union , Intersection , Difference - , Cartesian Product .

Page 14: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

15

Source Operator S Input : a list of documents Output :a collection of singleton bags

Examples :

S (*) All known XML documentsS (invoice*.xml) All XML documents whose filename match “invoice*.xmlS (*,schema.dtd) All known XML documents that conform to schema.dtd

Page 15: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

16

Follow operator Input : a path expression in entry point

notation Functionality : extracts vertices reachable

by path expression Output : a new bag that consists of the

extracted vertex + all contents of original bag (in case of unnesting follow)

Page 16: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

17

Follow operator (Example*)

Root invoice.xml invoice

<invoice>Invoice-element-content

</invoice>

Root invoice.xml invoice invoice.carrier

<invoice>Invoice-element-content

</invoice>

<carrier>carrier -element-content

</carrier >

(carrier:invoice)*Unnesting Follow

{[Root invoice.xml , invoice]}

{[Root invoice.xml , invoice, invoice.carrier]}

Page 17: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

18

Select operator Input : a set of bags Functionality : filters the bags of a

collection using a predicate Output : a set of bags that conform to the

predicate Predicate : Logical operator (,,), or simple

qualifications (,,,,,)

Page 18: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

19

Select operator (Example)

invoice.carrier =Sprint

Root invoice.xml invoice<invoice>

Invoice-element-content</invoice>

Root invoice.xml invoice<invoice>

Invoice-element-content</invoice>

Root invoice.xml invoice<invoice>

Invoice-element-content</invoice>

{[Root invoice.xml , invoice], [Root invoice.xml , invoice], ……………}

{[Root invoice.xml , invoice],… }

Page 19: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

20

Join operator Input: two collections of bags Functionality: Joins the two collections

based on a predicate Output: the concatenation of pairs of

pages that satisfy the predicate

Page 20: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

21

Join operator (Example)

Root invoice.xml invoice<invoice>

Invoice-element-content</invoice>

Root customer.xml customer<customer>

customer-element-content</customer>

account_number: invoice =number:customer

Root invoice.xml invoice Root customer.xml customer<invoice>

Invoice-element-content</invoice>

<customer>customer-element-content

</customer>

{[Root invoice.xml , invoice]} {[Root customer.xml , customer]}

{[Root invoice.xml , invoice, Root customer.xml , customer]}

Page 21: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

22

Expose operator

Input: a list of path expressions of vertices to be exposed

Output: a set of bags that contains vertices in the parameter list with the same order

Page 22: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

23

Expose operator (Example)Root invoice.xml invoice. bill_period invoice.carrier

<invoice>carrier-element-content

</invoice>

<carrier>bill_period -element-content

</carrier >

(bill_period,carrier)

{[Root invoice.xml , invoice.bill_period, invoice.carrier]}

Root invoice.xml invoice invoice.carrier invoice.bill_period

<invoice>Invoice-element-content

</invoice>

<carrier>bill_period -element-content

</carrier >

{[Root invoice.xml , invoice, invoice.carrier, invoice.bill_period]}

<invoice>carrier-element-content

</invoice>

Page 23: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

24

Vertex operator Creates the actual XML vertex that will

encompass everything created by an expose operator

Example :

(Customer_invoice)[((account)[invoice.account_number], (inv_total)[invoice.total])]

Page 24: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

25

Other operators Group : is used for arbitrary grouping of

elements based on their values Aggregate functions can be used with the

group operator (i.e. average) Rename : Changes entry point

annotation of elements of a bag. Example: (invoice.bill_period,date)

Page 25: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

26

Example: XML Source Documents

Invoice.xml

<Invoice_Document>

<invoice>

<account_number>2 </account_number>

<carrier>AT&T</carrier>

<total>$0.25</total>

</invoice>

<invoice>

<account_number>1 </account_number>

<carrier>Sprint</carrier>

<total>$1.20</total>

</invoice>

<invoice>

<account_number>1 </account_number>

<total>$0.75</total>

</invoice>

<auditor> maria </auditor>

</Invoice_Document>

Customer.xml

<Customer_Document>

<customer>

<account>1 </account>

<name>Tom </name>

</customer >

<customer>

<account>2 </account>

<name>George </name>

</customer >

</Customer _Document>

Page 26: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

27

Xquery ExampleList account number, customer name, and invoice

total for all invoices that have carrier = “Sprint”.

FOR $i in (invoices.xml)//invoice,

$c in (customers.xml)//customer

WHERE $i/carrier = “Sprint” and

$i/account_number= $c/account

RETURN

<Sprint_invoices>

$i/account_number,

$c/name,

$i/total

</Sprint_invoices>

Page 27: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

28

Example: Xquery output

<Sprint_Invoice>

<account_number>1 </account_number>

<name>Tom </name>

<total>$1.20</total>

</Sprint_Invoice >

Page 28: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

29

Algebra Tree Execution

customer (2) customer(1) Invoice (1) invoice (2) invoice (3)

Source (Invoices.xml) Source (cutomers.xml)

Follow (*.invoice) Follow (*.customer)

Select (carrier= “Sprint” )

invoice (2)

Join (*.invoice.account_number=*.customer.account)

invoice(2) customer(1)

Expose (*.account_number , *.name, *.total )

Account_number name total

Page 29: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

30

Outline

Concepts of Niagara Algebra

Operations

Optimization

Page 30: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

31

Optimization with Niagara

Optimizer based on Niagara algebra:

Use the operation more efficiently Produce simpler expressions by

combining operations

Page 31: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

32

Language Convention A and B are path expressions A< B -- Path Expression A is prefix of

B AnB --- Common prefix of path A and B AńB --- Greatest common prefix of path A and B ┴ --- Null path Expression

Page 32: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

33

Heuristics using Rewrite Rules

Allow optimization based on path selectivity

When applying un-nesting with operation Φμ

Page 33: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

34

Φμ(A) [Φμ(B)]=Φμ (B)[Φμ (A)]

TRUE or FALSE?

TRUE when

exists C such that C < A && C < B and C = AńB

Or AnB = ┴

Interchangeability of Follow operation

Page 34: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

35

Application of Rule on Invoice

Φμ(acc_Num:invoice)[Φμ(carrier:invoice)]

==

Φμ(carrier:invoice)[Φμ(acc_Num:invoice)] ?

TRUE or FALSE?

Page 35: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

36

Application of Rule on Invoice

Φμ(acc_Num:invoice)[Φμ(carrier:invoice)]

=Φμ(carrier:invoice)[Φμ(acc_Num:invoice)]

TRUE because both share common prefix “invoice”.

Case AńB = invoice

Page 36: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

37

Benefit of Rule Application NOTE: Assume acc_Num is required for each

invoice element, while carrier is not

THEN:Φμ(acc_Num:invoice)[Φμ(carrier:invoice)]

==

Φμ(carrier:invoice)[Φμ(acc_Num:invoice)]

Then what algebra tree do we prefer?

Page 37: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

38

Discussion

Reduction of Input Size on first

Sub-operation:

Φμ(carrier:invoice)

vsΦμ(acc_Num:invoice) (:

Page 38: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

39

Can we apply the rule below?

Φμ(acc_Num:invoice)[Φμ(acc_Num:Customer)]

Page 39: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

40

“acc_Num:invoice” and

“acc_Num:customer”

are two totally different paths

Case is: AnB = ┴

So yes, rule is valid.

Example

Page 40: 1 Introduction to XML Algebra CS561. 2 Data Model data model ~ core data structures and data types supported by DBMS relational database is a table (set-oriented)

41

Summary

XML Algebra

Operations

Optimization