Top Banner
1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li
28

1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.

Dec 18, 2015

Download

Documents

Amy Foster
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.

1

CS 561 Presentation:

Indexing and Querying XML Data for Regular Path Expressions

A Paper by Quanzhong Li and Bongki Moon

Presented by Ming Li

Page 2: 1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.

2

Our Objective

• Developing a system that will enable us to perform XML data queries efficiently.

Page 3: 1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.

3

XML Queries Languages

• Used for retrieving data from XML files.

• Use a regular path expression syntax.

• e.g. XPath, XQuery.

Page 4: 1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.

4

Queries Today - Inefficient

• Usually XML tree traversals – Inefficient.– Top-Down Approach– Bottom-Up Approach– An example:

the query:

/chapter/_*/figure

(finding all figures in all chapters.)

Page 5: 1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.

5

Our Objective - Refined

• Developing a system that will enable us to perform XML data queries efficiently

• Developing such a system consists of:– Developing a way to efficiently store XML data.– Developing efficient algorithms for processing

regular path expressions (e.g. XQuery expressions).

Page 6: 1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.

6

Storing XML Documents - XISS

• XISS - XML Indexing and Storage System.

• Provides us with ways to:– efficiently find all elements or attributes with the

same name string grouped by document which they belong to.

– quickly determine the ancestor-descendant relationship between elements and/or attributes in the hierarchy of XML data hierarchy.

Page 7: 1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.

7

Determining Ancestor-Descendent Relationship

• According to Dietz’s: for two given nodes x and y of a tree T, x is an ancestor of y iff x occurs before y in the preorder traversal and after y in the postorder traversal.

• Example:

Page 8: 1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.

8

Determining Ancestor-Descendent Relationship – cont.

• Advantage: the ancestor-descendent relationship can be determined in constant time.

• Disadvantage: a lack of flexibility.– e.g. inserting a new node requires recomputation

of many tree nodes.

Page 9: 1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.

9

• A new numbering scheme:– Each node is associated with a <order, size> pair:

• For a tree node y and its parent x:

[order(y), order(y) + size(y)] (order(x), order(x) + size(x)]

• For two sibling nodes x and y, if x is the predecessor of y in preorder traversal holds:

order(x) + size(x) < order(y).

Determining Ancestor-Descendent Relationship – cont.

exclusive

Page 10: 1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.

10

Determining Ancestor-Descendent Relationship – cont.

• Fact: for two given nodes x and y of a tree T, x is an ancestor of y iff:

order(x) < order(y) order(x) + size(x)

Page 11: 1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.

11

Determining Ancestor-Descendent Relationship – cont.

• Properties:– the ancestor-descendent relationship can be

determined in constant time.– flexibility – node insertion usually doesn’t require

recomputation of tree nodes.– an element can be uniquely identified in a

document by its order value.

Page 12: 1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.

12

XISS System Overview

Page 13: 1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.

13

Name Index and Value Table

• Objective: minimizing the storage and computation overhead by eliminating replicated strings and string comparisons.

• Name Index - mapping distinct name strings into unique name identifiers (nid).

• Value Table - mapping distinct value strings (i.e. attribute value and text value) into unique value identifiers (vid).

• Both implemented as a B+-tree.

Page 14: 1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.

14

The Element Index

• Objective: quickly finding all elements with the same name string.

• Structure:

Page 15: 1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.

15

The Attribute Index

• Objective: quickly finding all elements with the same name string.

• Structure:– Same structure as the Element Index except that the

record in attribute index has a value identifier vid which is a key used to obtain the attribute from the value table.

Page 16: 1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.

16

The Structure Index

• Objectives:– Finding the parent element and child elements (or

attributes) for a given element.– Finding the parent element for a given attribute.

• Structure:

Page 17: 1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.

17

The Structure Index – cont.

• Structure:– B+-tree using document identifier (did) as a key.– Leaf nodes: linear arrays with records for all

elements and attributes from an XML document.– Each record: {nid, <order,size>, Parent order, Child

order, Sibling order, Attribute order}.– Records are ordered by order value.

Page 18: 1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.

18

Querying Method

• Decomposing path expressions into simple path expressions.

• Applying algorithms on simple path expressions and their intermediate results.

Page 19: 1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.

19

Decomposition of Path Expressions

• The main idea: – A complex path expression is decomposed into

several simple path expressions.– Each simple path expression produces an

intermediate result that can be used in the subsequent stage of processing.

– The results of the simple path expressions are than combined or joined together to obtain the final result of the given query.

Page 20: 1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.

20

Basic Subexpressions - Example

Decomposition of

(E1/E2)*/ E3 / ((E4[@a=V]) | (E5/_*/E6)):

(1 )Single Element/Attribute

(2 )Element-Attribute

(3 )Element-Element

(4 )Kleene Closure

(5 )Union/

_/*/

* |

] [/

/

(4)

(2)

(3)

(5)

(3)

(3)

(3)

(1) (1) (1)(1) (1) (1)(1)

Page 21: 1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.

21

Example: EA-Join: Element and Attribute Join

Page 22: 1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.

22

EA-Join: Element and Attribute Join

Input:

{E1,…,Em}: Ei is a set of elements having a common document identifier (did);

{A1,…,An}: Aj is a set of elements having a common document identifier (did);

Output:

A set of (e,a) pairs such that the element e is the parent of the attribute a.

Page 23: 1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.

23

EA-Join: Element and Attribute Join

The Algorithm:

// Sort-merge {Ei} and {Aj} by did.

(1) foreach Ei and Aj with the same did do:

// Sort-merge Ei and Aj by

// PARENT-CHILD relationship

(2) foreach e Ei and a Aj do

(3) if (e is a parent of a) then output (e,a)

end

end

Page 24: 1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.

24

EA-Join – Example

• Consider the XML document:

<Ele Att=“A1”>

<Ele Att=“A2”> </Ele>

</Ele>

• And the query: /Ele[@Att=“A1”]

Ele <1,3>

Ele <3,1>

Att <4,0>

Att <2,0>

Page 25: 1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.

25

<Ele Att=“A1”>

<Ele Att=“A2”> </Ele>

</Ele>

• Sort-merging “Ele”s and “Att”s by parent-child relation ship will give us the list:<1,3>, <2,0>, <3,1>, <4,0>

• Finding the elements “Ele”s with a child attribute “Att” with a value “A1” from the accepted list is easy using the information in the Element Record.

EA-Join – Querying /Ele[@Att=“A1”]

Ele <1,3>

Ele <3,1>

Att <4,0>

Att <2,0>

Page 26: 1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.

26

EA-Join – Comments

• Only a two-stage sort-merge operation without additional cost of sorting:– First merge: by did.– Second merge: by examining parent-child relationship.

• This merge is based on the order values of the element and attribute as defined by the numbering scheme.

• Attributes should be placed before their sibling elements in the order of the numbering scheme.– guarantees that elements and attributes with the same did

can be merged in a single scan.

Page 27: 1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.

27

Conclusions

• XISS can efficiently process regular path expression queries.

• Performance improvement over the conventional methods by up to an order of magnitude.

• Future work:optimal page size or the break-even point between the two criteria.

Page 28: 1 CS 561 Presentation: Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Ming Li.

28

Thank you so much!