1 XQuery to SQL by XAT Xin Zhang Thanks: Brian, Mukesh, Maged, Lily, Elke.

Post on 21-Dec-2015

219 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

Transcript

1

XQuery to SQL by XAT

Xin ZhangThanks:

Brian, Mukesh, Maged, Lily, Elke

2

Outline Merged algebra proposed based on

Niagara XPERANTO

One thorough example of XQuery SQL

3

Data Model An Ordered Table in two dimensions

Tuple order Column order.

Every cell has its own domain Every column binds to one variable. The domain can be:

SQL domains. XML Fragment.

Can be a list of XML elements. Comparison are done by values

4

Data Model Examples Table of XML Fragments. Explicit Naming

E.g. variable bindings Implicit Naming

E.g. XPath notations. Reduce complexity of many

internal variables.

<carrier> $carrier</carrier

invoice_id

carrier

carrier_entry

carriers <carrier> $carrier</carrier<carrier> $carrier</carrier>……….

/ /invoice /invoice/account_number

$rate

5

Naming of Columns Implicit:

SQL operators Navigate

Explicit ( “name”): Variable binding:

Holding a set of values. Variable name ($name) is name of a column

Rename Distinguish in one operator where,

same “names” from different sources Abbreviate a very long “name”.

Create a new name for creation operators Need to used with those operators. E.g. Tagger

6

Operators SQL like (9):

Project, Select, Join (Theta, Outer, Semi), Groupby, Orderby, Union (Node, Outer), COp.

XML like (4): Tagger, Navigate, is(Element, Text),

Aggregate. Special:

SQL, Function, Source, Name, FOR

7

SQL like Operators (9)NiagaraNiagara XPERANTOXPERANTO

Project Expose Project

Select Select Select

Theta Join Join Theta Join

Outer Join N/A Outer Join

Semi Join N/A N/A

Groupby Group Groupby

Orderby N/A Orderby

Union Union Union

Outer Union

Union Outer Union

COp N/A Correlated Join

8

XML like OperatorsNiagarNiagaraa

XPERANTOXPERANTO

Tagger*(pattern)

Vertex Project:cr8(Elem, AttList, Att, XMLFragList),

Navigate(from, path)

Follow Project:get(TagName, Attributes, Contents, AttName, AttValue), Unnest

Is N/A Select:is(Element, Text),

Aggregate Group AggXMLFrags

9

Special Operators

NiagarNiagaraa

XPERANTXPERANTOO

DescriptionDescription

SQL N/A Input Denote a SQL query.

Function

N/A Function Used to represent recursive query

Source Source Table, View

Identify a data source.

Name Rename

N/A Naming of columns.

FOR N/A N/A FOR iteration.

10

Operator Specification Description

Input Specification. Output Specification. Logic description.

Illustrative Example

11

Naming Operator Syntax:

Name(“from_name”, “to_name”) Simplified Syntax:

to_name := from_name

12

Steps in Translation XQuery XML Algebra Tree User View XML Algebra Tree View Composition Computation Pushdown Optimization

13

<?xml version=”1.0” encoding=”US-ASCII” ?> <!DOCTYPE invoice [<!ELEMENT invoice (account_number,

bill_period, carrier+, itemized_call*, total)>

<!ELEMENT account_number (#PCDATA)><!ELEMENT bill_period (#PCDATA)><!ELEMENT carrier (#PCDATA)><!ELEMENT itemized_call EMPTY><!ATTLIST itemized_call

no ID #REQUIREDdate CDATA #REQUIREDnumber_called CDATA #REQUIREDtime CDATA #REQUIREDrate (NIGHT|DAY) #REQUIREDmin CDATA #REQUIREDamount CDATA #REQUIRED>

<!ELEMENT total (#PCDATA)>]>

<invoice>

<account_number>555 777-3158 573 234 3</account_number>

<bill_period>Jun 9 - Jul 8, 2000</bill_period>

<carrier>Sprint</carrier>

<itemized_call no=”1” date=”JUN 10” number_called=”973 555-8888” time=”10:17pm” rate=”NIGHT” min=”1” amount=”0.05” />

<itemized_call no=”2” date=”JUN 13” number_called=”973 650-2222” time=”10:19pm” rate=”DAY” min=”1” amount=”0.15” />

<itemized_call no=”3” date=”JUN 15” number_called=”206 365-9999” time=”10:25pm” rate=”NIGHT” min=”3” amount=”0.15” />

<total>$0.35</total>

</invoice>

Example of Telephone Bill

14

Example XQueryUser XQuery: <summary>{

FOR $rate IN distinct(document(“invoice”)/invoice/itemized_call@rate)

LET $itemized_call := document(“invoice”)/invoice/itemized_call[@rate=$rate]

WHERE $itemized_call/@number_called LIKE ‘973%’

RETURN<rate>$rate</rate><number_of_calls>count($itemized_call)</number_of_calls>

}</summary>

Count number of itemized_calls in calling area 973 grouped by the calling rate.

15

XQuery XML Algebra Tree Divide into query blocks Convert each query block into XML

Algebra Tree (XAT). Identify Correlated Operators Combine into one XML Algebra

Tree. Query decorrelation

16

Query BlocksUser XQuery: <summary>{

FOR $rate IN distinct(document(“invoice”)/invoice/itemized_call@rate)

LET $itemized_call := document(“invoice”)/invoice/itemized_call[@rate=$rate]

WHERE $itemized_call/@number_called LIKE ‘973%’

RETURN<rate>$rate</rate><number_of_calls>count($itemized_call)</number_of_calls>

}</summary>

B1: Construct summary from the result from B2B2: Get all the distinct rate and iterate through it.

B1

B2

B3

B3: Count itemized call for a given rate.

The block identification is

arbitrary (wrong).

17

XAT of B1

B1B1<summary>

</summary>

B2B2

XAT:

Tagger(<summary>[V1] </summary>)

B2

[V2] it is a name instead of a part of pattern.

Name(“Tagger(<summary>[V1]</summary>)”, “V2”)

18

XAT of B2

B3B3{FOR

$rate IN distinct(document(“invoice”)/invoice/itemized_call@rate)

}

B3B3

XAT:

Select(distinct(“invoice/itemized_call/@rate:/”))

B3

Source(“invoice.xml”)

Navigate(“/”, invoice/itemized_call/@rate)

Name(“distinct(invoice/itemized_call/@rate:/)”, “$rate”)

FOR($rate)

Aggregate

19

XAT of B3

B4B4LET

$itemized_call := document(“invoice”) /invoice/itemized_call [@rate=$rate]

WHERE $itemized_call /@number_called LIKE ‘973%’

RETURN<rate>$rate</rate><number_of_calls> count($itemized_call) </number_of_calls>

XAT:

Navigate(“$itemized_call”, @rate)

Source(“invoice.xml”)

Navigate(“/”, invoice/itemized_call)

B2

Select(“@rate:$itemized_call” = “$rate”)

Name(“invoice/itemized_call:/”, “$itemized_call”)

20

XAT of B3 (Cont.)

B4B4LET

$itemized_call := document(“invoice”) /invoice/itemized_call [@rate=$rate]

WHERE $itemized_call /@number_called LIKE ‘973%’

RETURN<rate>$rate</rate><number_of_calls> count($itemized_call) </number_of_calls>

XAT:

Navigate(“$itemized_call”, @number_called)

Select(“@number_called:$itemized_call” like ‘973%’)

21

XAT of B3 (Cont.)

B4B4LET

$itemized_call := document(“invoice”) /invoice/itemized_call [@rate=$rate]

WHERE $itemized_call /@number_called LIKE ‘973%’

RETURN<rate>$rate</rate><number_of_calls> count($itemized_call) </number_of_calls>

XAT:

Tagger(<rate>[$rate]</rate><number_of_calls>

[count($itemized_call)]</number_of_calls>)

Select(count(“$itemized_call”))B2

Name(“Tagger(<rate>[$rate]</rate> <number_of_calls>[count($itemized_call)]</number_of_calls>)”, “V1”)

22

Put it Together

Select(count(“$itemized_call”))

Navigate(“$itemized_call”, @number_called)

Select(“@number_called:$itemized_call” like ‘973%’)

Source(“invoice.xml”)

Navigate(“/”, invoice/itemized_call)

Select(“@rate:$itemized_call” = “$rate”)

Name(“Tagger(<summary>[V1]</summary>)”, “V2”)

Select(distinct(“invoice/itemized_call/@rate:/”))

Source(“invoice.xml”)

Navigate(“/”, invoice/itemized_call/@rate)

Name(“invoice/itemized_call:/”, “$itemized_call”)

Navigate(“$itemized_call”, @rate)

B1

B2

B3

FOR($rate)

Name(“distinct(invoice/itemized_call/@rate:/)”, “$rate”)

Aggregate()Tagger(<rate>[$rate]</rate><number_of_calls>[count($itemized_call)]</number_of_calls>)

Name(“Tagger(<rate>[$rate]</rate> <number_of_calls>[count($itemized_call)]</number_of_calls>)”, “V1”)

Tagger(<summary>[V1] </summary>)

23

Syntax Suger

Select(count(“$itemized_call”))

Navigate(“$itemized_call”, @number_called)

Select(“@number_called:$itemized_call” like ‘973%’)

Source(“invoice.xml”)

$itemized_call := Navigate(“/”, invoice/itemized_call)

Select(“@rate:$itemized_call” = “$rate”)

V2 := Tagger(<summary>[V1] </summary>)

$rate := Select(distinct(“invoice/itemized_call/@rate:/”))

Source(“invoice.xml”)

Navigate(“/”, invoice/itemized_call/@rate)Navigate(“$itemized_call”, @rate)

B1

B2

B3

FOR($rate)

Aggregate()

V1:=Tagger(<rate>[$rate]</rate><number_of_calls>[count($itemized_call)]</number_of_calls>)

24

Query Decorrelation for COp Top-down approach over XAT Tree. Approach:

Correlated Binding (CB) Op1[COp(CB, Op2)[Op3[Correlated

Operator[A],B]]] Op1[ROJ(CB)[Op2[Groupby(CB, Op3[]) [Operator[Cartesian[A,B]]]], B]]

For example: Correlated Join Outer Join with Groupby

with Cartesian

25

Query Decorrelation for FOR Top-down approach over XAT Tree. Approach:

Correlated Binding (CB) Op1[FOR(CB)[Op2[Correlated

Operator[A],B]]] Op1[Groupby(CB, Op2[]) [Operator[Cartesian[A,B]]]]

Differences: SQL Decorrelation: Return Outer Query XQuery Decorrelation: Return Inner Query CO: Return both Outer/Inner Query

26

FOR Decorrelation Example

Source(“invoice.xml”)

Select(“@rate:$itemized_call” = “$rate”)

…1

Source(“invoice.xml”)

…3

B2

B3

FOR($rate)

…2

Source(“invoice.xml”)Select(“@rate:$itemized_call” = “$rate”)

Groupby(“$ratel”, )

Cartesian

Source(“invoice.xml”)

…3

B1 B1B2

B3

AggregateAggregate

…2

…1

27

Default XML View<invoice>

<row><id> 1 </id><account_number>555 777-3158 573 234

3</account_number><bill_period> Jun 9 – Jun 8, 2000 </bill_period><total>$0.35</total>

</row>

</invoice><carrier>

<row><invoice_id> 1 </invoice_id><carrier>Sprint</carrier>

</row>

</carrier>...

id account_number bill_period total

1 555 777-3158 573 234 3

Jun 9 – Jun 8, 2000

$0.35

invoice

invoice_id carrier

1 Sprint

carrier

invoice_id

no

date number_called

time rate min

amount

1 1 JUN 10

973 555-8888

10:17pm

NIGHT

1 0.05

1 2 JUN 13

973 650-2222

10:19am

DAY 1 0.15

1 3 JUN 15

206 365-9999

10:25pm

NIGHT

3 0.15

itemized_call

28

User Defined XML View<invoice>

<account_number>555 777-3158 573 234 3</account_number>

<bill_period>Jun 9 - Jul 8, 2000</bill_period>

<carrier>Sprint</carrier>

<itemized_call no=”1” date=”JUN 10” number_called=”973 555-8888” time=”10:17pm” rate=”NIGHT” min=”1” amount=”0.05” />

<itemized_call no=”2” date=”JUN 13” number_called=”973 650-2222” time=”10:19pm” rate=”DAY” min=”1” amount=”0.15” />

<itemized_call no=”3” date=”JUN 15” number_called=”206 365-9999” time=”10:25pm” rate=”NIGHT” min=”3” amount=”0.15” />

<total>$0.35</total>

</invoice>

<invoice><row>

<id> 1 </id><account_number>555 777-3158 573 234

3</account_number><bill_period> Jun 9 – Jun 8, 2000 </bill_period><total>$0.35</total>

</row>

</invoice><carrier>

<row><invoice_id> 1 </invoice_id><carrier>Sprint</carrier>

</row>

</carrier><itemized_call>

<row><invoice_id>1</invoice_id><no>1</no>…

</row>…

</itemized_call>

29

User Defined XML View Cont.

Create view invoice as (FOR

$invoice IN view(“default”)/invoice/row

RETURN<invoice>

<account_number>$invoice/account_number/text()</account_number><bill_period>$invoice/bill_period/text()</bill_period>FOR

$carrier in view(“default”)/carrier/rowWHERE

$carrier/invoice_id = $invoice/idRETURN

<carrier>$carrier/carrier/text()</carrier>FOR

$itemized_call in view(“default”)/itemized_call/rowWHERE

$itemized_call/invoice_id = $invoice/idRETURN

<itemized_call no=$itemized_call/no/text() date=$itemized_call/date/text() number_called=$itemized_call/number_called/text() time=$itemized_call/time/text() rate=$itemized_call/rate/text() min=$itemized_call/min/text() amount=$itemized_call/amount /text()>

SORTBY (@no)<total>$invoice/total/text()</total>

</invoice>

)

30

User Defined XML View Block

Create view invoice as (FOR

$invoice IN view(“default”)/invoice/row

RETURN<invoice>

<account_number>$invoice/account_number/text()</account_number><bill_period>$invoice/bill_period/text()</bill_period>FOR

$carrier in view(“default”)/carrier/rowWHERE

$carrier/invoice_id = $invoice/idRETURN

<carrier>$carrier/carrier/text()</carrier>FOR

$itemized_call in view(“default”)/itemized_call/rowWHERE

$itemized_call/invoice_id = $invoice/idRETURN

<itemized_call no=$itemized_call/no/text() date=$itemized_call/date/text() number_called=$itemized_call/number_called/text() time=$itemized_call/time/text() rate=$itemized_call/rate/text() min=$itemized_call/min/text() amount=$itemized_call/amount /text()>

SORTBY (@no)<total>$invoice/total/text()</total>

</invoice>

)

B4

B5

B6

31

XML View XAT

V4 := Tagger(<invoice><account_number>[$invoice/account_number/text()]</account_number><bill_period>[$invoice/bill_period/text()</bill_period>…[V3]<total>[$invoice/total/text()]</total></invoice>)

V3 := Tagger(<itemized_call no=[] date=[]number_called=[] time=[] rate=[] min=[] amount=[]/>

Aggregate()Aggregate()

Source(“default..xml”)

$invoice := Navigate(“/”,invoice/row )

FOR($invoice/id)

Source(“default.xml”)

$itemized_call := Navigate(“/”, itemized_call/row)

Navigate($itemized_call, no/text())

Navigate($itemized_call, invoice_id)

Select(“$itemized_call/invoice_id”=“$invoice/id”)

Navigate(“$invoice”, id)

B5

Navigate($itemized_call, amount/text())

32

3-Way Correlation

…2

Source(“invoice.xml”)

B4

FOR($invoice/id)

…1

B5B6

33

3-Way Decorrelation

…2

Source(“default.xml”)

B4

JOIN($invoice/id)

…1

B5 with CartesianB6 with Cartesian

GB($invoice/id, …) GB($invoice/id, …)

…2

Source(“default.xml”)

34

View XAT After Decorrelation

V4 := Tagger(<invoice><account_number>[$invoice/account_number/text()]</account_number><bill_period>[$invoice/bill_period/text()</bill_period>…[V3]<total>[$invoice/total/text()]</total></invoice>)

V3 := Tagger(<itemized_call no=[] date=[]number_called=[] time=[] rate=[] min=[] amount=[]/>

Aggregate()Groupby($invoice/id, Aggregate())

Source(“default..xml”)

$invoice := Navigate(“/”,invoice/row )

Join($invoice/id)

Source(“default.xml”)

$itemized_call := Navigate(“/”, itemized_call/row)

Navigate($itemized_call, no/text())

Navigate($itemized_call, invoice_id)

Join(“$itemized_call/invoice_id”=“$invoice/id”)

Navigate(“$invoice”, id) B5

Navigate($itemized_call, amount/text())

Groupby($invoice/id…)

Source(“default..xml”)

$invoice := Navigate(“/”,invoice/row )

Navigate(“$invoice”, id)

35

View Composition Input:

User Query XAT + User View XAT Output:

Simplified composite XAT Approach:

XAT Cutting: Remove un-referenced columns and operators.

Pushdown Navigation By using the commutative rules

Cancel out the navigation operators By using the composition rules

36

XAT Cutting Cut Query Blocks

User query only require itemized_call. B5 is cut, Invoice is cut

B4 is simplified. B6 is simplified.

Cut Columns User query only used

itemized_call@rate.

37

View XAT After B5 is Cut.

V4 := Tagger(<invoice><account_number>[$invoice/account_number/text()]</account_number><bill_period>[$invoice/bill_period/text()</bill_period[V3]<total>[$invoice/total/text()]</total></invoice>)

V3 := Tagger(<itemized_call no=[] date=[]number_called=[] time=[] rate=[] min=[] amount=[]/>

Aggregate()Groupby($invoice/id, Aggregate())

Source(“default..xml”)

$invoice := Navigate(“/”,invoice/row )

Source(“default.xml”)

$itemized_call := Navigate(“/”, itemized_call/row)

Navigate($itemized_call, no/text())

Navigate($itemized_call, invoice_id)

Join(“$itemized_call/invoice_id”=“$invoice/id”)

Navigate(“$invoice”, id)

Navigate($itemized_call, amount/text())

38

View After Columns are Cut.

V4 := Tagger(<invoice>[V3] </invoice>)

V3 := Tagger(<itemized_call number_called=[] rate=[] />

Aggregate()

Groupby($invoice/id, Aggregate())

Source(“default..xml”)

$invoice := Navigate(“/”,invoice/row )

Source(“default.xml”)

$itemized_call := Navigate(“/”, itemized_call/row)

Navigate($itemized_call, number_called/text())

Navigate($itemized_call, invoice_id)

Join(“$itemized_call/invoice_id”=“$invoice/id”)

Navigate(“$invoice”, id)

Navigate($itemized_call, rate/text())

39

Navigation Cancel Out Navigation Pushdown

Based on some transformation rules. E.g. commutative of navigation and

other operators. Navigation + Tagger Cancel Out

Composition Rules. The cancellation result is “renaming”

40

Query XAT Navi. Pushdown

Select(count(“$itemized_call”))

Navigate(“$itemized_call”, @number_called)

Select(“@number_called:$itemized_call” like ‘973%’)

Source(“invoice.xml”)

$itemized_call := Navigate(“/”, invoice/itemized_call)

Select(“@rate:$itemized_call” = “$rate”)

Navigate(“$itemized_call”, @rate)

B3V1:=Tagger(<rate>[$rate]</rate><number_of_calls>[count($itemized_call)]</number_of_calls>)

Select(count(“$itemized_call”))

Navigate(“$itemized_call”, @number_called)

Select(“@number_called:$itemized_call” like ‘973%’)

Source(“invoice.xml”)

$itemized_call := Navigate(“/”, invoice/itemized_call)

Select(“@rate:$itemized_call” = “$rate”)

Navigate(“$itemized_call”, @rate)

V1:=Tagger(<rate>[$rate]</rate><number_of_calls>[count($itemized_call)]</number_of_calls>)

41

Navi. Tagger Cancel Out

Navigate(“$itemized_call”, @number_called)

Source(“invoice.xml”)

$itemized_call := Navigate(“/”, invoice/itemized_call)

Navigate(“$itemized_call”, @rate)

B3…1

V4 := Tagger(<invoice>[V3] </invoice>)

V3 := Tagger(<itemized_call number_called=[] rate=[] />

Aggregate()

Groupby($invoice/id, Aggregate())

Navigate($itemized_call, number_called/text())

Navigate($itemized_call, rate/text())

…2

42

The Result of Cancel Out

…1

$itemized_call@rate := Navigate($itemized_call, rate/text())

$Itemized_call@number_called := Navigate($itemized_call, number_called/text())

…2

43

Computation Pushdown Goal: XAT SQL operators + XML

operators Step 0: Navigation Pushdown. Step 1: XML Default View SQL

Operators Renaming columns

Step 2: SQL Computation Pushdown. By commutative and composition rules. E.g: predicates pushdown.

44

Navigation Pushdown.

Source(“default..xml”)

$invoice := Navigate(“/”,invoice/row )Source(“default.xml”)

$itemized_call := Navigate(“/”, itemized_call/row)

Navigate($itemized_call, invoice_id)

Join(“$itemized_call/invoice_id”=“$invoice/id”)

Navigate(“$invoice”, id)

$itemized_call@rate := Navigate($itemized_call, rate/text())

$Itemized_call@number_called := Navigate($itemized_call, number_called/text())

Source(“default.xml”)

$itemized_call := Navigate(“/”, itemized_call/row)

$itemized_call@rate := Navigate($itemized_call, rate/text())

$Itemized_call@number_called := Navigate($itemized_call, number_called/text())

Join(“$itemized_call/invoice_id”=“$invoice/id”)

45

XML Default View SQL

Source(“default.xml”)

$itemized_call := Navigate(“/”, itemized_call/row)

$itemized_call@rate := Navigate($itemized_call, rate/text())

$Itemized_call@number_called := Navigate($itemized_call, number_called/text())

Source(“itemized_call”)

Project(rate, number_called)

$itemized_call@rate := rate

$Itemized_call@number_called := number_called

46

Computation Pushdown

A SQL Block

Select(count(“$itemized_call”))

Select(“@number_called:$itemized_call” like ‘973%’)

Select(“@rate:$itemized_call” = “$rate”)

V1:=Tagger(<rate>[$rate]</rate><number_of_calls>[count($itemized_call)]</number_of_calls>)

B3

Select(“@number_called:$itemized_call” like ‘973%’)

Select(count(“$itemized_call”))

Select(“@rate:$itemized_call” = “$rate”)

V1:=Tagger(<rate>[$rate]</rate><number_of_calls>[count($itemized_call)]</number_of_calls>)

A SQL Block

47

Result of the Transformation

Tagger(<summary>[V1]</summary>)

V1 := Aggregate

Tagger(<rate>[rate]</rate><number_of_calls>[count(*)]</number_of_calls>)

SQL:SELECT rate, count(*)FROM itemized_call, invoiceWHERE number_called LIKE ‘973%’ AND invoice.id = itemized_call.invoice_idGROUPBY rate

48

Optimization Efficient Publishing XML Views

Sorted Outer Union. Special Tagger implementation

A lot More!

49

Summary XQuery XAT

Query Block Identification Query Decorrelation

View Composition XAT Cutting Navigation Pushdown Navigation Cancel Out

Computation Pushdown Navigation Pushdown XML Default View SQL Operators Computation Pushdown

Optimization

top related