Top Banner
` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the MASS Index
39

` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

`

1

VAMANA (Talk 2)

(vǎ - mǎ - nǎ)

Venkatesh Raghavan & Prof. Elke Rundensteiner

DSRG Talk

1ST May 2003

An Efficient XPath Query Engine Exploiting the MASS Index

Page 2: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

2

Introduction Purpose of the talk.

Generation of Execution Tree Execution

Running Example 1. Running Example 2.

XPath Expression Execution. Cost Estimation. Heuristics and Transformation.

Page 3: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

3

Running Examples

E.g. 1: //name/parent::person/descendant::watch

E.g. 2: //name [ text() = “Klemens Pelz” ]/parent::person

<people>

<person id="person1">

<name> Klemens Pelz </name>

<people>

<person id="person1">

<name> Hayato Cappelletti </name>

<watches> 

<watch open_auction="open_auction82" />

Page 4: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

4

Bigger Picture

MASS(A Multi-Axis Storage Structure

for Large XML Documents)

VAMANA(XPath Query Engine)

XQuery Engine(future development)

Execution Tree

Mass Interface Node Set

Node Set

XPath Expression

XPath Processor

Page 5: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

5

How many “ROOT(s)” are there? Root of the Document

We call it “Document Root”

Root of the expression //name/parent::person/descendant::watch

We call it “First Location Step”

Root of Execution Tree We call it “ROOT”

Page 6: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

6

XPath Processor

Execution Tree

XPath Expression

XPath ProcessorE.g. 2: //name [ text() = “Klemens Pelz” ]/parent::person

name//

CONTEXT

personParent

ROOT

BIPRED=

PRED

textchild

OPERAND

“Klemens Plez”LITERALOPERAND

Phase 1: Parse Tree

Page 7: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

7

Contd..

name//

CONTEXT

personParent

ROOT

BIPRED=

PRED

textchild

OPERAND

“Klemens Plez”LITERALOPERAND

Phase I: Parse Tree

BIPRED=

PRED

textchild

OPERAND

“Klemens Plez”LITERALOPERAND

Phase II: Transformed Parse Tree

Execution Tree

XPath Expression

XPath Processor

Page 8: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

8

Phase III: Execution Tree Generation

Execution Tree

XPath Expression

XPath Processor

name//

CONTEXT

personParent

ROOT

BIPRED=

PRED

textchild

OPERAND

“Klemens Plez”LITERALOPERAND

Phase II: Transformed Parse Tree

“person”X: Parent

“name”X: //

“”X: child

“Klemens Plez”

BI_PREDICATE“EQ”

Phase III: VAMANA Execution Tree

Page 9: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

9

VAMANA Nodes (VNode)

Node Base

VRootNode

MassNode

VBinaryPredicateNode

VExistPredicateNode

VJoinNode

VLiteralNode

VAMANA(XPath Query Engine)

Execution Tree

Mass Interface Node Set

MASS

Page 10: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

10

VNode Structure

Context Side

Expression Side

Root Node

child

VAMANA(XPath Query Engine)

Execution Tree

Mass Interface Node Set

MASS

Page 11: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

11

VNode Flow Structure Data-Flow style of querying.

Most of commercial relational database system. Each node is arranged in a fashion such that data “flow”

from one node to another in a procedure-consumer fashion. Correctness. Each node performs some operation on the data that flows

through it. The result is produced by the last node on the dataflow chain.

IN SHORT: Data Flows upwards. Control Flows downwards.

Iterative.

VAMANA(XPath Query Engine)

Execution Tree

Mass Interface Node Set

MASS

Page 12: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

12

Contd. Iterative.

Currently VAMANA executes nodes iteratively. So no copies of the data is made.

IS IT A PROBLEM?

MASS produces nodes in document order so not a problem.

But there are some expression that in sibling order.

Work in progress.

VAMANA(XPath Query Engine)

Execution Tree

Mass Interface Node Set

MASS

Page 13: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

13

Execution Tree

“name”X: //

“watch”X: AXIS_DESCENDANT

“person”X: AXIS_PARENT

E.g. 1: //name/parent::person/descendant::watch

Context Side

Root Node

VAMANA(XPath Query Engine)

Execution Tree

Mass Interface Node Set

MASS

Page 14: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

14

How Do We EXECUTE ?

Step 1: Set Context Node of the root of the expression.

In this example the root of the expression is the root of the document.

Step 2: Ask the VAMANA Root Node for nodes.

//name/parent::person/descendant::watch

VAMANA(XPath Query Engine)

Execution Tree

Mass Interface Node Set

MASS

Page 15: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

15

Step1: Setting Context for the “First Location Step”

“watch”X: AXIS_DESCENDANT

“person”X: AXIS_PARENT

“name”X: //

//name/parent::person/descendant::watch

Page 16: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

16

OUT OF NODE

FETCHING

INTIAL

“watch”X: AXIS_DESCENDANT

“person”X: AXIS_PARENT

“name”X: //

b.i.c.c

b.i.c

b.i.c

b.i.c.m.c

b.i.c.c

b.i.c.m.c

//name/parent::person/descendant::watch

Page 17: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

17

“watch”X: AXIS_DESCENDANT

b.i.c

“person”X: AXIS_PARENT

b.i.c.c

“name”X: //

b.i.c

b.i.c.m.c

b.i.c.m.c

b.i.c.c

b.i.c.m.e

b.i.c.m.e//name/parent::person/descendant::watch

Page 18: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

18

“watch”X: AXIS_DESCENDANT

b.i.c

b.i.c.m.e

b.i.c.m.e

“person”X: AXIS_PARENT

b.i.c.c

b.i.i

“name”X: //

b.i.i.c

b.i.i.c

b.i.i

b.i.i.m.c

//name/parent::person/descendant::watch

Page 19: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

19

IO Operation

a.a , a.b , a.c

a.a.a , a.b.a, a.b.b , a.c.a , a.c.a, a.c.b/z

//y

** Please see handout

Page 20: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

20

Example 2

“name”X: //

“person”X: AXIS_PARENT

“ ”X: AXIS_CHILD

“Klemens Pelz”

BI_PREDICATEEQ

Context Side

Expression Side

//name [ text() = “Klemens Pelz” ]/parent::person

Page 21: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

21

“person”X: AXIS_PARENT

BI_PREDICATEEQ

“name”X: //

“ ”X: AXIS_CHILD

“Klemens Pelz”

b.i.e.c

b.i.e.c

b.i.e.c.b

Klemens Pelz

b.i.e.c

b.i.e

//name [ text() = “Klemens Pelz” ]/parent::person

b.i.e.c

b.i.e.c

Page 22: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

22

Determining Selectivity

Count.

The exact count of the number of nodes in MASS storage structure of that particular nodetest.

IN. The number of tuples that are fetched by the child VNode.

OUT. The number of tuples produced by the VNode.

I_Tuples. Total number of tuples processed till that VNode. This includes the cutrrent node also.

NodeType:NodeTest:X:Count:IN:OUT:I_Tuples:

Page 23: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

23

Example 1: //name/parent::person/emailaddress

NodeType: MASSNodeTest: nameX: //Count: 482IN: 482OUT: 482

NodeType: MASSNodeTest: personX: AXIS_PARENTCount: 255IN: 482OUT: ?

Page 24: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

24

Worst Case – Costing Categorize the axis into three division Division 1:

child | descendant | descendant-or-self

NodeType: NodeTest: X: Count:IN: OUT:

NodeType: NodeTest: X: Count:IN: OUT:

X

Y

Cases:

1. #X > #Y

2. #Y > #X#X

Page 25: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

25

Contd. Division 2:

parent, ancestor, ancestor-or-self, following, following-sibling, preceding, preceding-sibling

NodeType: NodeTest: X: Count:IN: OUT:

NodeType: NodeTest: X: Count:IN: OUT:

X

Y

Cases:

1. #X > #Y

2. #Y > #X#Y

Page 26: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

26

Contd. Division 3:

Self

For Example: //*/self::X Y/self::*

NodeType: NodeTest: X: Count:IN: OUT:

NodeType: NodeTest: X: Count:IN: OUT:

X

Y

Cases:

1. #X > #Y #Y

2. #Y > #X #X

Page 27: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

27

NodeType: MASSNodeTest: nameX: //Count: 482IN: 482OUT: 482I_Tuple: 482

NodeType: MASSNodeTest: personX: AXIS_PARENTCount: 255IN: 482OUT: 482I_Tuple: 737

NodeType: MASSNodeTest: watchX: AXIS_DESCENDANTCount: 488IN: 482OUT: 488I_Tuple: 1225

Page 28: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

28

What about Binary Operator Cost expression sides w.r.t. to child. Operator = AND | OR | EQ.

ALL go out.

Arithmetic Operators. ALL go out. Because cannot predict before execution.

Page 29: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

29

Contd.

Page 30: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

30

Heuristics

Higher the ratio, better the selectivity.

Generate a multimap <scaled(IN/OUT),VNode>. Each optimize-able node can then applied the

rules that apply to it.

Ratio = IN/OUT

Scaled Ratio = scale0..1 (IN/OUT)

Page 31: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

31

Transformation Rule 1:

“name”X: //

“person”X: AXIS_PARENT

BI_PREDICATEEQ

“ ”X: AXIS_CHILD

“Klemens Pelz”

Binary Predicate with text comparison Value Index

“name”X: //

“Klemens Pelz”X: AXIS_VALUE

“Klemens Pelz”

“name”X:AXIS_PARENT

Page 32: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

32

Transformation Rule 2 Mass Node to Join

“name”X: //

“watch”X: AXIS_DESCENDANT

“person”X: AXIS_PARENT

Root Node

“name”X: //

“person”X: AXIS_PARENT

“watch”

X: AXIS_DESCENDANT

JOINX: AXIS_DESCENDANT

//name/parent::person/descendant::watch

Page 33: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

33

* RemovalRule:

p/descendant :: */child::n ≡ p/descendant::nWhere,

p : path expression

Need for this rule: with nodes "*" as node test, during the cost

estimation this might be the spoilsport.

Page 34: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

34

“Axis::self” RemovalRule:

p/descendant::*/self::m ≡ p/descendent::m

Rule:

p/descendant-or-self::*/self::m ≡ p/descendent-or-self::m

Need for the node: “self” node in combination with * or a node test not

necessary.

Page 35: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

35

Reverse Axes rules Rule : p/descendant::n/parent::m

≡ //descendant-or-self::m[child::n]

Rule: p/descendant::n/m ≡ p/descendant::m[parent::n]

Rule: /descendant::m/preceding::n ≡ /descendant::n [ following::m]

From Paper: Symmetry in XPath by Dan Olteanu, Holger Meuss, Tim Furche, Francois Br

Page 36: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

36

Predicate Axis Rules Rule:

p/descendant::* [child::n] ≡ p [descendant::n] / descendant:: *

Predicate Node to Join.

Page 37: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

37

Conclusion Work in progress in THREE main areas.

Frame work for XPath expression execution. Selectivity Determination. Transformation Rules.

Page 38: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

38

Page 39: ` 1 VAMANA (Talk 2) (vǎ - mǎ - nǎ) Venkatesh Raghavan & Prof. Elke Rundensteiner DSRG Talk 1 ST May 2003 An Efficient XPath Query Engine Exploiting the.

39

References1. James Clark and Steve DeRose. XML Path Language (XPATH),

http://www.w3.org/TR/xpath, 2002.

2. S.Boag, D.Chamberlin, Mary F. Fernandez, D.Florescu, J.Robie and J.Siméon,

XQuery 1.0: An XML Query Language. W3C Working Draft, http://www.w3.org/TR/xquery/, 2002.

3. Kurt W. Deschler and Elke Rundensteiner. MASS- Multi Axis Storage Structure, 2002, Technical Report in progress\.

4. T. Milo and D. Suciu. Index structure for path expression, In Proceedings of 7th International Conference on Database Theory, 1999, pages 277-295.

5. Flavio Rizzolo, Alberto Mendelzon. Indexing XML Data with ToXin},WebDB, pages 49-54, Santa Barbara, USA, 2001.

6. Q. Li and B. Moon. Indexing and Querying XML Data for Regular Path Expressions, Proceedings of 27th International Conference on Very Large Database (VLDB'2001), Rome, Italy, September 2001, pages 361-370.

7. XMark - The XML Benchmark project. http://monetdb.cwi.nl/xml/.