Top Banner
XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler
99

XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Dec 14, 2015

Download

Documents

Skyla Barham
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2

Georg Gottlob, TU Wien

Christoph Koch, U. Edinburgh

Based on joint work with R. Pichler

Page 2: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

ContentsPart 1• Xpath Basics• Axis Evaluation• Experiments with current systems • Polynomial-time evaluation of Core Xpath• Core XPath and datalog • Polynomial-time evaluation of full Xpath

Part 2• Context simplification and efficient evaluation of Xpath• Parallel complexity of Xpath• Automata-based techniques:

– Xpath on Streaming XML– Expressive queries and automata.

• Further relevant work

Page 3: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Context Simplification and Efficient Evaluation of XPath

Page 4: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Time and space bound

Bottom-up evaluation based on CVT:– Time O(|data|5 * |query|2), space O(|data|4 * |query|2).

Space bound (n … number of nodes in input document.):• Contexts are at most triples: at most n^3 contexts.• Sizes of values:

– Node sets: at most O(n)– Strings, numbers: at most O( |data|* |query|) – (iterated

concatenation of strings, multiplication of numbers) Each CVT is of size (|data|4 * |query|).

Time bound: most expensive computation is O(n^2) – Relational operation “=“ on node sets (e.g. a/b//c[d//e/f/g = h/i//j])

Page 5: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Alternative context representation

• Contexts represented as (“previous context node, “current context node”)rather than (“context node”, “position”, “size”).

• Need to recompute “position” and “size” on demand.

• Complexity lowered to time O(|data|4 * |query|2), space O(|data|3 * |query|2).

//a/b[position() + 1 = size()]

1:a 5:a

6:b 7:b2:b 3:b 4:b

0:c

child::b … { (1,2), (1,3), (1,4), (5,6), (5,7) }

child::b[position()+1=size()] … { (1,3), (5,6) }

Page 6: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Context Simplification Technique

1. Only materialize relevant context.

2. Core Xpath evaluation algorithm for outermost and innermost paths //a/b/c//d[…]/e[…(a/b/c)].

3. Treating “position” and “size” in a loop.• Because of tree shape of query, loops never have to be nested.

position() +1 = last()

position() = count( )descendant::a

/child::b[ ]

(cn)

(cn,cp, cs) - loop

child::b[ ] Compute node set for whichchild::b[…] is true(cn,cp, cs) - loop

Page 7: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

• “Wadler Fragment” [Wadler, 1999]: Core Xpath + position(), last(), and arithmetics.

• Evaluation in quadratic time and linear space.

• For x in [[//a]] compute contexts (y,p,n) in x.[[b]] Compute Y = { y | (y,p,n) 2 x.[[b]] and p*2=n }.• Similarly, compute Z = { z | z.[[ d[position()*3 = last()] ]] is true}.• Compute X = { x | z 2 Z, x 2 z.[[ child::c ]]-1 } – in linear time.• Result is { w | v \in X \cap Y, w \in v.[[descendant::e]] }.

Linear Space Fragment

//a/b[position() * 2 = last() and c/d[position()*3 = last()]]//e

(cn) (cn)

(cn,cp,cs)

(cn)(cn)

(cn,cp,cs)

Page 8: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Summary

Full XPath• Bottom-up algorithm based on CVT

– Time O(|data|5 * |query|2), space O(|data|4 * |query|2).

• Top-down evaluation– Time O(|data|4 * |query|2), space O(|data|3 * |query|2).

• Context-reduction technique– Time O(|data|4 * |query|2), space O(|data|2 * |query|2).

Wadler fragment– Time O(|data|2 * |query|2), space O(|data| * |query|).

Core Xpath– Time and space O(|data| * |query|).

Page 9: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Parallel Complexity of XPath

Page 10: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Parallel Complexity of XPath

• Known: Xpath is in P w.r.t. combined complexity[G., K., and Pichler, VLDB 2002].

• P-hardness => unlikely that there is an efficient parallel algorithm (conjecture: P > NC)

• Even quite restrictive fragments of Xpath are P-hard– Core Xpath using only child, parent, and descendant axes, no

“branching” of tree patterns.– Proof by encoding circuits, somewhat involved!

• But: without negation, Core Xpath is in LOGCFL (< NC2, highly parallelizable!!)

Page 11: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

PF – Path Query Fragment

• PF = Core XPath without conditions.• E.g. //a/b//c/parent::d//f/g/ancestor::a/*

• Theorem: PF is NL-complete w.r.t. combined complexity (and L-reductions).

• Membership: paths easy to guess and check in NL.• NL-Hardness by reduction from Graph Reachability …

Page 12: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.
Page 13: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Where can we go from v2 in one step?

*::c/parent::*/child::e/parent::ntc/descenda::child V||V||2*

Page 14: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Where can we go from v2 in one step?

*::c/parent::*/child::e/parent::ntc/descenda::child V||V||2*

Page 15: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Where can we go from v2 in one step?

*::c/parent::*/child::e/parent::ntc/descenda::child V||V||2*

Page 16: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Where can we go from v2 in one step?

*::c/parent::*/child::e/parent::ntc/descenda::child V||V||2*

Page 17: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Where can we go from v2 in one step?

*::c/parent::*/child::e/parent::ntc/descenda::child V||V||2*

Page 18: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Where can we go from v2 in one step?

*::c/parent::*/child::e/parent::ntc/descenda::child V||V||2*

Page 19: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Where can we go from v2 in one step?

• Reachable from v2 in one step: v1, v3!

Page 20: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

PF is NL-hard.

• Reachability in precisely m steps:

• Add loop at each node to graph => reachability in at most m steps.• Set m = |E|.

1V||V||2* */::c/parent::*/child::e/parent::ntc/descenda::child kk

jv::self0

miv /::descendant/

Page 21: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Further fragments with low parallel complexity

Combined complexity of Core Xpath is in L if:

1. Only one-step axes are used (child, parent; self).

2. Only transitive downward axes are used (descendant, descendant-or-self, …).

Page 22: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Increasing the Size of the LOGCFL Fragment

• “positive Wadler fragment” [Wadler, 2000]: just like positive Core XPath, but with position arithmetics in conditions.– child::a[position()+1 = last()] … get the second-last child

labeled “a”.– No iteration of predicates: child::a[…][…].

• Theorem (combined complexity): the positive WF is– LOGCFL-complete;– with iterated predicates (already when iterated at most

twice), it is P-complete.

Page 23: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Increasing the Size of the LOGCFL Fragment

• pXPath: “positive”/parallel XPath.1. No negation2. No iterated predicates […][…]3. Depth of nesting of arithmetic operations inside a predicate is

bounded by some constant.4. Forbidden built-in functions: count, sum, string, local-name,

name, namespace-uri, string-length, normalize-space.5. Forbidden: relational operations on booleans.

• Theorem. pXPath is LOGCFL-complete (combined complexity).

• Maximal parallelizable fragment of Xpath, unless P = NC.– Adding any of the features (1) – (5) leads to P-hardness.

Page 24: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Combined Complexity of XPath

Page 25: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Data and Query Complexity

• Theorem. PF is L-complete under NC1-reductions (data complexity).

• Theorem. XPath w/o multiplication, concatenation is in L w.r.t. query complexity.

• Surprisingly, data complexity and query complexity are low; combined complexity is higher!

L

L-complete(NC1-red.)

XPath

PF

Data complexity

Page 26: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Processing Xpath on Streams using Finite Automata

Page 27: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

FSA on Streams

• Translate Xpath path query into FSA, process stream of (e.g.) SAX events.– Very good scalability, low memory consumption (stack

needed)

• Selective dissemination of information (SDI) / publish-subscribe(cf. Xfilter [Altinel and Franklin, VLDB 2000], Xtrie [Chan et al., ICDE 2002]).– Boolean queries.– Extensions to support branching tree patterns, condition

predicates, backward axes, …– Goal is to evaluate multiple queries at once (10^4 – 10^6

queries.)

Page 28: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Example: $x in //a/b

a

b

a a b

ab

b$x $x

NFA DFA

(0)

Page 29: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Example: //a/b

a

b

a a b

ab

b$x $x

NFA DFA

(0)(01)

Page 30: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Example: //a/b

a

b

a a b

ab

b$x $x

NFA DFA

(0)(01)(01)

Page 31: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Example: //a/b

a

b

a a b

ab

b$x $x

NFA DFA

(0)(01)(01)(02)

$x

Page 32: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Example: //a/b

a

b

a a b

ab

b$x $x

NFA DFA

(0)(01)(01)

$x

Page 33: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Example: //a/b

a

b

a a b

ab

b$x $x

NFA DFA

(0)(01)

$x

Page 34: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Example: //a/b

a

b

a a b

ab

b$x $x

NFA DFA

(0)(01)

$x

(01)

Page 35: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Example: //a/b

a

b

a a b

ab

b$x $x

NFA DFA

(0)(01)

$x

Page 36: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Example: //a/b

a

b

a a b

ab

b$x $x

NFA DFA

(0)(01)

$x

(02)$x

Page 37: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Example: //a/b

a

b

a a b

ab

b$x $x

NFA DFA

(0)(01)

$x

(02)$x

(01)

Page 38: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Example: //a/b

a

b

a a b

ab

b$x $x

NFA DFA

(0)(01)

$x

(02)

$x

(01)(02)

$x

Page 39: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Example: //a/b

a

b

a a b

ab

b$x $x

NFA DFA

(0)(01)

$x

(02)

$x

(01)$x

Page 40: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Example: //a/b

a

b

a a b

ab

b$x $x

NFA DFA

(0)(01)

$x

(02)

$x

$x

Page 41: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Example: //a/b

a

b

a a b

ab

b$x $x

NFA DFA

(0)(01)

$x

$x

$x

Page 42: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Example: //a/b

a

b

a a b

ab

b$x $x

NFA DFA

(0)

$x

$x

$x

Page 43: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Size of DFAs

//a/*/*/b

Page 44: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Size of DFAs

• Exponential in the size of Xpath statement, but– Only exponential in number of occurrences of “*”.– In case of automaton for multiple queries, exponential in

number of occurrences of “//”.

• Lazy evaluation of DFA– Computation of states and transitions only on demand.– Saves much time and space in practice: documents usually

from quite restrictive language.

[Green, Miklau, Onizuka, Suciu, ICDT 2003]

Page 45: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Extensions

• Branching tree patterns.• Condition predicates.• Backward axes

• Boolean queries (“Can tree pattern be embedded into XML document?”)– Rather than node-selecting queries.

Page 46: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Highly Expressive Queries and Automata

Page 47: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Motivation

• Scalability in databases = (all three points at the same time)– Strictly linear time.– Little main memory required (DB in secondary storage).– Little jumping around in the data, sequential scans of disk

preferred (streaming).• Paged sequential reading much faster than random

access.• Node-selecting queries on unranked trees (XML)

– Higher expressiveness than what is possible with single pass.

• Folklore: unary MSO queries can be evaluated in two passes through the tree.

Page 48: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

The Arb Query Processor

Evaluates node-selecting queries– In two sequential scans of the data.– Memory requirements: O(depth(tree)), otherwise

independent of size of DB.– Highly parallelizable.– Tree Automata-based.– High expressiveness: unary Monadic Second

Order Logic (MSO).– Succinct representation of automata.

[Frick, Grohe, K., LICS 2003; K., VLDB 2003]

Page 49: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Selecting Tree Automata (STAs)

• STA: Nondeterministic bottom-up tree automata

with a set of selecting states.• Select a node if it is assigned a selecting state in all (or one)

accepting runs:

or

• Expressive power: unary MSO queries on trees.

[Neven’s thesis]; [Frick, Grohe & K., LICS 2003]

Page 50: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Two-Phase Query Evaluation

From STA

1. Deterministic bottom-up tree automaton

– compute reachable states.2. Deterministic top-down tree automaton (with selection)

• Eliminate state-to-node assignments that do not lead to accepting run.

• Select nodes of query result.

[Frick, Grohe & K., LICS 2003]

Page 51: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Representation on Disk

Page 52: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Representation on Diska

b

ba

a

c c c

b

b b

ba

a

Page 53: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Representation on Diska

b

ba

a

c c c

b

b b

ba

a

FirstChild NextSibling

Page 54: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Representation on Disk1

2 3

4 5 6 9 11 12

7 8

10 13

14

a

b

ba

a

c c c

b

b b

ba

a

FirstChild NextSibling

Page 55: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Representation on Disk1

2 3

4 5 6 9 11 12

7 8

10 13

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

10 01 11 01 01 00 01 11 01 01 01 01 00 00

a b a a b c b a b c c a b bLabel:

Children?

a

b

ba

a

c c c

b

b b

ba

a

FirstChild NextSibling

Page 56: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Running Automata by Sequential Disk Scans

Page 57: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Running Automata by Seq. Scans

• Deterministic top-down tree automaton– One sequential forward scan of the data.– Memory: Stack bounded by depth of tree.

• Deterministic bottom-up tree automaton– One sequential backward scan of the data.– Memory: Stack bounded by depth of tree.

• For unranked trees !

Page 58: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Bottom-up Traversal1

2 3

4 5 6 9 11 12

7 8

10 13

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

14

10 01 11 01 01 00 01 11 01 01 01 01 00 00

Page 59: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Bottom-up Traversal1

2 3

4 5 6 9 11 12

7 8

10 13

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

14

10 01 11 01 01 00 01 11 01 01 01 01 00 00

Page 60: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Bottom-up Traversal1

2 3

4 5 6 9 11 12

7 8

10 13

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

14

10 01 11 01 01 00 01 11 01 01 01 01 00 00

13

Page 61: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Bottom-up Traversal1

2 3

4 5 6 9 11 12

7 8

10 13

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

14

13

10 01 11 01 01 00 01 11 01 01 01 01 00 00

Page 62: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Bottom-up Traversal1

2 3

4 5 6 9 11 12

7 8

10 13

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

14

13

10 01 11 01 01 00 01 11 01 01 01 01 00 00

Page 63: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Bottom-up Traversal1

2 3

4 5 6 9 11 12

7 8

10 13

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

14

10 01 11 01 01 00 01 11 01 01 01 01 00 00

12

Page 64: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Bottom-up Traversal1

2 3

4 5 6 9 11 12

7 8

10 13

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

14

12

10 01 11 01 01 00 01 11 01 01 01 01 00 00

Page 65: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Bottom-up Traversal1

2 3

4 5 6 9 11 12

7 8

10 13

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

14

10 01 11 01 01 00 01 11 01 01 01 01 00 00

11

Page 66: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Bottom-up Traversal1

2 3

4 5 6 9 11 12

7 8

10 13

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

14

10 01 11 01 01 00 01 11 01 01 01 01 00 00

10

Page 67: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Bottom-up Traversal1

2 3

4 5 6 9 11 12

7 8

10 13

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

14

10 01 11 01 01 00 01 11 01 01 01 01 00 00

9

Page 68: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Bottom-up Traversal1

2 3

4 5 6 9 11 12

7 8

10 13

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

14

10 01 11 01 01 00 01 11 01 01 01 01 00 00

9

Page 69: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Bottom-up Traversal1

2 3

4 5 6 9 11 12

7 8

10 13

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

8

10 01 11 01 01 00 01 11 01 01 01 01 00 00

Page 70: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Bottom-up Traversal1

2 3

4 5 6 9 11 12

7 8

10 13

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

7

10 01 11 01 01 00 01 11 01 01 01 01 00 00

Page 71: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Bottom-up Traversal1

2 3

4 5 6 9 11 12

7 8

10 13

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

7

10 01 11 01 01 00 01 11 01 01 01 01 00 00

6

Page 72: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Bottom-up Traversal1

2 3

4 5 6 9 11 12

7 8

10 13

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

7

10 01 11 01 01 00 01 11 01 01 01 01 00 00

5

Page 73: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Bottom-up Traversal1

2 3

4 5 6 9 11 12

7 8

10 13

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

7

10 01 11 01 01 00 01 11 01 01 01 01 00 00

4

Page 74: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Bottom-up Traversal1

2 3

4 5 6 9 11 12

7 8

10 13

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

3

10 01 11 01 01 00 01 11 01 01 01 01 00 00

Page 75: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Bottom-up Traversal1

2 3

4 5 6 9 11 12

7 8

10 13

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

2

10 01 11 01 01 00 01 11 01 01 01 01 00 00

Page 76: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Bottom-up Traversal1

2 3

4 5 6 9 11 12

7 8

10 13

14

1 2 3 4 5 6 7 8 9 10 11 12 13 14

1

10 01 11 01 01 00 01 11 01 01 01 01 00 00

Page 77: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Monadic Datalog and TMNF

• Monadic datalog: datalog, all “intensional predicates” are unary.• Over unranked, ordered, finite trees:

– Unary: Root, hasFirstChild, hasNextSibling, Label_a, and their complements.– Binary: FirstChild, NextSibling

Example:

D0(x) :- Root(x).

D1(x) :- D0(x0), First-Child(x0, x).

D0(x) :- D1(x0), First-Child(x0, x).

D0(x) :- D0(x0), Next-Sibling(x0, x).

D1(x) :- D1(x0), Next-Sibling(x0, x).

• TMNF (“tree-marking normal form”) - restricted syntax:– P(x) :- P1(x), P2(x). P(x) :- P0(x0), R(x0, x). P(x) :- P0(x0), R(x, x0).

•D0: nodes at even depth in tree.

•D1: nodes at odd depth in tree.

Page 78: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Known Facts about Monadic Datalog

[Gottlob & K., PODS 2002]:• M.dl.o.t. can be evaluated in time O(|Program| * |Data|).• M.dl.o.t. captures the unary MSO queries over trees.

[Gottlob & K., LICS 2002], [Frick, Grohe, K. LICS 2003]:• Linear-time reduction to TMNF.• Linear-time reduction also from Core Xpath to TMNF (negation!)

[Grohe and Schweikardt, CSL 2003]:• But: M.dl. much less succinct than MSO, monadic fixpoint logic.

– However, no problems observed in practice yet.

Page 79: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

TMNF Example

P1(x) :- Root(x).

P2(y) :- P1(x), FirstChild(x,y).

P3(y) :- P2(x), FirstChild(x,y).

P4(y) :- P3(x), FirstChild(y, x).

P5(y) :- P4(x), FirstChild(y, x).

{P1}

{}

{}

Page 80: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

TMNF Example

P1(x) :- Root(x).

P2(y) :- P1(x), FirstChild(x,y).

P3(y) :- P2(x), FirstChild(x,y).

P4(y) :- P3(x), FirstChild(y, x).

P5(y) :- P4(x), FirstChild(y, x).

{P1}

{P2}

{}

Page 81: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

TMNF Example

P1(x) :- Root(x).

P2(y) :- P1(x), FirstChild(x,y).

P3(y) :- P2(x), FirstChild(x,y).

P4(y) :- P3(x), FirstChild(y, x).

P5(y) :- P4(x), FirstChild(y, x).

{P1}

{P2}

{P3}

Page 82: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

TMNF Example

P1(x) :- Root(x).

P2(y) :- P1(x), FirstChild(x,y).

P3(y) :- P2(x), FirstChild(x,y).

P4(y) :- P3(x), FirstChild(y, x).

P5(y) :- P4(x), FirstChild(y, x).

{P1}

{P2,P4}

{P3}

Page 83: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

TMNF Example

P1(x) :- Root(x).

P2(y) :- P1(x), FirstChild(x,y).

P3(y) :- P2(x), FirstChild(x,y).

P4(y) :- P3(x), FirstChild(y, x).

P5(y) :- P4(x), FirstChild(y, x).

{P1,P5}

{P2,P4}

{P3}

Page 84: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Implementation

• Bottom-up phase has to deal with nondeterminism – very large sets of states possible.

• Compact representation of state sets using residual logic programs.

• Compilation of TMNF program P into

– Deterministic bottom-up automaton

• Sets of reachable states of STA become states of .

• Each such state is represented as a residual logic program.

– Deterministic top-down automaton.

• Both evaluated lazily: Transitions computed on demand and stored.

Page 85: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Propositional “Local” Program

P1(x) :- Root(x).

P2(y) :- P1(x), FirstChild(x,y).

P3(y) :- P2(x), FirstChild(x,y).

P4(y) :- P3(x), FirstChild(y, x).

P5(y) :- P4(x), FirstChild(y, x).

P1 :- Root.

P2[1] :- P1.

P3[1] :- P2.

P4 :- P3[1].

P5 :- P4[1].

“Local”

[1] [2]

Page 86: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Bottom-up Run

P1 :- Root.

P2[1] :- P1.

P3[1] :- P2.

P4 :- P3[1].

P5 :- P4[1].A

Page 87: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Bottom-up Run

P1 :- Root.

P2[1] :- P1.

P3[1] :- P2.

P4 :- P3[1].

P5 :- P4[1].A {}

Page 88: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Bottom-up Run

{}

P1 :- Root.

P2[1] :- P1.

P3[1] :- P2.

P4 :- P3[1].

P5 :- P4[1].

A

Page 89: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Bottom-up Run

{}

{P4 :- P2}

P1 :- Root.

P2[1] :- P1.

P3[1] :- P2.

P4 :- P3[1].

P5 :- P4[1].

A

Represents 2^3 * (2^2 - 1) = 24 reachable states

Page 90: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Bottom-up Run

{}

P1 :- Root.

P2[1] :- P1.

P3[1] :- P2.

P4 :- P3[1].

P5 :- P4[1].

{P4 :- P2}

A

+ {Root; P4[1] :- P2[1]}

Page 91: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Bottom-up Run

{}

P1 :- Root.

P2[1] :- P1.

P3[1] :- P2.

P4 :- P3[1].

P5 :- P4[1].

{P4 :- P2}

A

+ {Root; P4[1] :- P2[1]}

{P1; P2[1]; P4[1]; P5}

Page 92: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Top-down Run

{}

P1 :- Root.

P2[1] :- P1.

P3[1] :- P2.

P4 :- P3[1].

P5 :- P4[1].

{P4 :- P2}

A {P1; P2[1]; P4[1]; P5}

Page 93: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Top-down Run

{}

P1 :- Root.

P2[1] :- P1.

P3[1] :- P2.

P4 :- P3[1].

P5 :- P4[1].

{P4 :- P2}A

{P1; P2[1]; P4[1]; P5}

+ {P2; P4}

Page 94: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Top-down Run

{}

P1 :- Root.

P2[1] :- P1.

P3[1] :- P2.

P4 :- P3[1].

P5 :- P4[1].

{P2; P3[1]; P4}

A

{P1; P2[1]; P4[1]; P5}

+ {P2; P4}

Page 95: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Top-down Run

{}

P1 :- Root.

P2[1] :- P1.

P3[1] :- P2.

P4 :- P3[1].

P5 :- P4[1].A

{P1; P2[1]; P4[1]; P5}

+ {P3} {P2; P3[1]; P4}

Page 96: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Top-down Run

{P3}

P1 :- Root.

P2[1] :- P1.

P3[1] :- P2.

P4 :- P3[1].

P5 :- P4[1].A

{P1; P2[1]; P4[1]; P5}

+ {P3} {P2; P3[1]; P4}

Page 97: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

• Encode string as almost complete binary “infix” tree.

• Represent backward step between leaves as caterpillar (tree-walking) expression.

• Express regular expression over strings as monadic datalog program over infix tree.

Example: Parallel Regular Expression Matching

e

x

a

m

p

l

e

Page 98: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

Some further interesting work

• Structural Joins, Twig Joins– [Al-Khalifa et al., ICDE 2002; Bruno, Koudas, and Srivastava, SIGMOD

2002; …]– Exploit tree structure to compute matches of tree pattern in time O(|input| + |

output|).

• Index Structures for Path Expressions– [Kemper and Moerkotte, 1992; Milo and Suciu, ICDT 1999]– Bisimulation; data guides, 1-indexes, t-indexes, …

• Optimization of XPath– Containment and Minimization [Miklau and Suciu, PODS 2002, Neven and

Schwentick, ICDT 2003; Wood, WebDB 2001, ICDT 2003; Deutsch and Tannen, KRDB 2001]

– Satisfiability [Hidders, DBPL 2003]– Axiom sytems for query rewriting [Benedikt, Fan and Kuper, ICDT 2003]

• Closure Properties for Xpath Fragments– [Benedikt, Fan and Kuper, ICDT 2003]

Page 99: XPath Query Processing DBPL9 Tutorial, Sept. 8, 2003, Part 2 Georg Gottlob, TU Wien Christoph Koch, U. Edinburgh Based on joint work with R. Pichler.

END