Introduction To XML Algebra

Wan LiuBintou KaneAdvanced Database Instructor: Elka

2/11/20021

Outline

Reasons for XML algebra Niagara algebra AT&T Algebra

Data Model and Design We need a clear framework to design a

database A data model is like creating different

data structures for appropriate programming usage. It is a type system, it is abstract.

Relational database is implemented by tables, XML format is a new one method for information integration.

Why XML Algebra? It is common to translate a query

language into the algebra. First, the algebra is used to give a

semantics for the query language. Second, the algebra is used to

support query optimization.

XML Algebra HistoryLore Algebra (August 1999)

-- Stanford University

IBM Algebra (September 1999) --Oracle; IBM; Microsoft Corp

YAT Algebra (May 2000)

AT&T Algebra (June 2000) --AT&T; Bell Labs

Niagara Algebra (2001) -- University of Wisconsin -Madison

NIAGARA Title : Following the paths of XML

Data: An algebraic framework for XML query evaluation

By : Leonidas Galanis, Efstratios Viglas, David J. DeWitt, Jeffrey. F. Naughton, and David Maier.

OutLine Concepts of Niagara Algebra

Operations

Optimization

Goals of Niagara Algebra

Be independent of schema information Query on both structure and content Generate simple,flexible, yet powerful

algebraic expressions Allow re-use of traditional optimization

techniques

Example: XML Source Documents

Invoice.xml

<Invoice_Document>

<account_number>2 </account_number>

</invoice>

<carrier>Sprint</carrier>

</invoice>

</invoice>

</Invoice_Document>

Customer.xml

<Customer_Document>

</customer >

<name>George </name>

</customer >

</Customer _Document>

XML Data Model and Tree Graph

Example:Invoice_Document

Invoice Invoice…

numbercarrier total number

carriertotal

2 AT&T $0.25 1 Sprint $1.20

<Invoice_Document> <invoice> <number>2</number> <carrier>Sprint</carrier> <total>$0.25</total> </invoice>

<invoice><number>1</number> <carrier>Sprint</carrier> <total>$1.20</total> </invoice>

</Invoice_Document>

Ordered Tree Graph,

Semi structured Data

XML Data Model [GVDNM01]

Collection of bags of vertices. Vertices in a bag have no order. Example:

Root invoice.xml invoice invoice.account_number

<invoice>Invoice-element-content

</invoice>

< account_number >element-content

</ account_number >

[Root“invoice.xml”, invoice, invoice. account_number ]

Data Model Bag elements are reachable by path

expressions. The path expression consists of two

parts : An entry point A relative forward part

Example: account_number:invoice

Operators Source S , Follow , Select , Join ,

Rename , Expose , Vertex , Group , Union , Intersection , Difference - , Cartesian Product .

Source Operator S Input : a list of documents Output :a collection of singleton bags Examples : S (*) All Known XML documentsS (invoice*.xml) All XML documents whose filename matches “invoice*.xmlS (*,schema.dtd) All known XML documents that conform to

schema.dtd

Follow operator Input : a path expression in entry

point notation Functionality : extracts vertices

reachable by path expression Output : a new bag that consist of

the extracted vertex + all the contents of the original bag (in care of unnesting follow)

Follow operator (Example*)

Root invoice.xml invoice

</invoice>

Root invoice.xml invoice invoice.carrier

</invoice>

<carrier>carrier -element-content

</carrier >

(carrier:invoice)*Unnesting Follow

{[Root invoice.xml , invoice]}

{[Root invoice.xml , invoice, invoice.carrier]}

Select operator Input : a set of bags Functionality : filters the bags of a

collection using a predicate Output : a set of bags that conform

to the predicate Predicate : Logical operator (,,), or simple

qualifications (,,,,,)

Select operator (Example)

invoice.carrier =Sprint

Root invoice.xml invoice<invoice>

Invoice-element-content</invoice>

{[Root invoice.xml , invoice], [Root invoice.xml , invoice], ……………}

{[Root invoice.xml , invoice],… }

Join operator Input: two collections of bags Functionality: Joins the two

collections based on a predicate Output: the concatenation of pairs of

pages that satisfy the predicate

Join operator (Example)

Root customer.xml customer<customer>

customer-element-content</customer>

account_number: invoice =number:customer

Root invoice.xml invoice Root customer.xml customer<invoice>

<customer>customer-element-content

</customer>

{[Root invoice.xml , invoice]} {[Root customer.xml , customer]}

{[Root invoice.xml , invoice, Root customer.xml , customer]}

Expose operator Input: a list of path expressions of

vertices to be exposed Output: a set of bags that contains

vertices in the parameter list with the same order

Expose operator (Example)

Root invoice.xml invoice. bill_period invoice.carrier

<invoice>carrier-element-content

</invoice>

<carrier>bill_period -element-content

</carrier >

(bill_period,carrier)

{[Root invoice.xml , invoice.bill_period, invoice.carrier]}

Root invoice.xml invoice invoice.carrier invoice.bill_period

</invoice>

<carrier>bill_period -element-content

</carrier >

{[Root invoice.xml , invoice, invoice.carrier, invoice.bill_period]}

<invoice>carrier-element-content

</invoice>

Vertex operator

Creates the actual XML vertex that will encompass everything created by an expose operator

Example :

(Customer_invoice)[((account)[invoice.account_number], (inv_total)[invoice.total])]

Other operators Group : is used for arbitrary

grouping of elements based on their values Aggregate functions can be used with

the group operator (i.e. average) Rename : Changes the entry point

annotation of the elements of a bag. Example: (invoice.bill_period,date)

Example: XML Source Documents

Invoice.xml

<Invoice_Document>

</invoice>

<carrier>Sprint</carrier>

</invoice>

</invoice>

<auditor> maria </auditor>

</Invoice_Document>

Customer.xml

<Customer_Document>

</customer >

<name>George </name>

</customer >

</Customer _Document>

Xquery ExampleList account number, customer name, and

invoice total for all invoices that has carrier = “Sprint”.

FOR $i in (invoices.xml)//invoice,

$c in (customers.xml)//customer

WHERE $i/carrier = “Sprint” and

$i/account_number= $c/account

RETURN

<Sprint_invoices>

$i/account_number,

$c/name,

$i/total

</Sprint_invoices>

Example: Xquery output

<Sprint_Invoice>

</Sprint_Invoice >

Algebra Tree Execution

customer (2) customer(1) Invoice (1) invoice (2) invoice (3)

Source (Invoices.xml) Source (cutomers.xml)

Follow (*.invoice) Follow (*.customer)

Select (carrier= “Sprint” )

invoice (2)

Join (*.invoice.account_number=*.customer.account)

invoice(2) customer(1)

Expose (*.account_number , *.name, *.total )

Account_number name total

Optimization with Niagara

Optimizer based on the Niagara algebra

Use the operation more efficiently

Produce simpler expression by combining operations

Language Convention A and B are path expressions A< B -- Path Expression A is

prefix of B AnB --- Common prefix of path

A and B AńB --- Greatest common of

path A and B ┴ --- Null path Expression

Use of Rule 8.5Make profit of rule 8.5

Allows optimization based on path selectivity

When applying un-nesting follow operation Φμ

Φμ(A) [Φμ(B)]=Φμ (B)[Φμ (A)]

True WhenExist C / C <A && C < B

C = AńBOr AnB = ┴Interchangeability of Follow operation

Application of 8.5 With Invoice

Φμ(acc_Num:invoice)[Φμ(carrier:invoice)] *

?=Φμ(carrier:invoice)[Φμ(acc_Num:invoice)] **

Both Share the common prefix invoice

Case AńB = invoice

Benefit of Rule Application Note if:acc_Num required for each invoice Elementcarrier is not required for invoice Element

Then using *

Φμ(acc_Num:invoice)[Φμ(acc_Num:customer)]

make more sense than ** Why?

Reduction of Input Size on the firstSub-operation

Φμ(carrier:invoice)

Should we or can we apply the 8.5 below?Φμ(acc_Num:invoice)[Φμ(acc_Num:Customer)]Why?

acc_Num:invoice and

acc_Num:Customer are totally different path

Case is: AnB = ┴ Then yes

Rule 8.7 , 8.9 , 8.11 Interesting Helps identify

When and where to use selection to decrease size of input operation to

subsequent operationExample Algebra tree slide 28Selected before join.

Addition would be

Give computation for finding when rule can be applied automatically in a case and then apply it.

AT&T Algebra

AT&T Algebra Introduction

The algebra is derived from the nested relational algebra.

AT&T algebra makes heavy use of list comprehensions, a standard notation in the function programming community.

AT&T algebra uses the functional programming language Haskell as a notation from presenting the algebra.

AT&T data model The data model merges attribute and

element nodes, and eliminates comments.

Declare Basic Type: Node.Text :: String ->nodeelem :: Tag -> [Node] ->noderef :: Node ->Node

<<bibbib>> <<book yearbook year=“1999”>=“1999”> <<titletitle> Data on the Web</title>> Data on the Web</title> <year> 1999</year><year> 1999</year> </book></book>

</bib></bib>

elem “bib” [

elem “book”[

elem “@year” [ text “1999” ],

elem “title” [text “Data on the web” ] ]]

Basic Type Declarations To find the type of a node,

isText :: Node -> Bool isElem :: Node -> Bool isRef :: Node -> Bool

For a text node, string :: Node -> String For an element node,

1)tag :: Node -> Tag 2)children :: Node -> [Node]

For a reference node, dereference :: Node -> Node

Nested relational algebra… In the nested relational approach, data is

composed of tuples and lists. Tuple values and tuple types are written

in round brackets. (1999,"Data on theWeb",["Abiteboul"]) :: (Int,String,[String]) Decompose values: year :: (Int,String,[String]) year (x,y,l) = x

Nested relational algebra… Comprehensions: List comprehensions can

be used to express fundamental query operations, navigation, cartesian product, nesting, joins.

Example: [ value x | x <- children book0, is "author" x ]

==> [ "Abiteboul" ] Normal expression:[ exp | qual1,...,qualn ] bool-exp pat <- list-exp

Nested relational algebra… Using comprehensions to write queries.

Navigatefollow :: Tag -> Node -> [Node] follow t x = [ y | y <- children x, is t y ] Cartesian product[ (value y, value z) | x <- follow "book" bib0, y <- follow "title" x, z <- follow "author" x ] ==> [ ("Data on the Web", "Abiteboul")]

Nested relational algebra… Joins.

elem "reviews"elem "reviews" [ [

elem "book" [ elem "book" [

elem "title" [ text"Data on the elem "title" [ text"Data on the Web" ], Web" ],

elem "review" [ text "This is elem "review" [ text "This is great!" ]] great!" ]]

elem “bib” [

elem “book”[

elem “@year” [ text “1999” ],

elem “title” [text “Data on the web” ] ]]

[ (value y, int (value z), value w) | x <- follow "book" bib0,

y <- follow "title" x,

z <- follow "@year" x,

u <- follow "book" reviews0,

v <- follow "title" u,

w <- follow “@year" u,

y == v ]

==> [("Data on the Web", 1999, "This is great!")]

Nested relational algebra… Regular expression matching

( [ (x,y,u) | x <- item "@year", y <- item "title", u <- rep (item "author") ] ) :: Reg (Node,Node,[Node] ) match reg0 book0

==> [(elem "@year" [text "1999"], elem "title" [text "Data on the

Web"],

[elem "author" [text "Abiteboul"],

elem "author" [text "Buneman"],

elem "author" [text "Suciu"] ] ) ]

Match :: Reg a -> Node-> [a]

Result

Nested relational algebra… Sorting.

sortBy :: (a -> a -> Bool) -> [a] -> [a]

sortBy (<=) [3,1,2,1] ==> [1,1,2,3]

GroupinggroupBy :: (a -> a -> Bool) -> [a] -> [[a]] groupBy (==) [3,1,2,1] == [[2],[1,1],[3]]

Cross Comparisons of Algebra

Niagara and AT&T standalone XML algebras

Niagara proposed after W3C had selected proposed standard

and has operators which operate on sets of bags

At&T algebra chosen as proposed standard by W3C

-- expressions resemble high level query language -- latest version of document referred to as “Semantics of XML Query Language XQuery”

Future Work

Need more different evaluation strategies which would allow for flexible query plans

Develop physical operators that take advantage of physical storage structures and generate mapping

from query tree to a physical query plan

Introduction To XML Algebra

invoice invoice

algebra historylore

xml source

paths of xml data

xml format

xml query evaluationby

known xml documentss

stanford universityibm

Documents

00. XML Introduction

Introduction à XML - Slide list · Introduction à XML -.....

XML Introduction. Index Markup Language: SGML, HTML, XML An....

1 Introduction to XML Algebra Based on talk prepared for...

Introduction to XML Introduction to XML X. Gonze CECAM 2003.

1 XML Algebra Comparison between: XPERANTO NIAGARA.

Lecture Introduction to XML Lecture Introduction to XML What...

Introduction to XML.

Introduction to XML · Introduction to XML XML Document...

Introduction to XML -...

XML: an introduction

An Algebra for XML Query

Introduction XML - IRITMathieu.Raynal/docs/ens/sri2A/... ·...

Introduction to XML - kth.se · Introduction to XML XML...

An Algebra for XML Query - University of Edinburgh

01 - XML - Introduction to XML