Top Banner
Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suci u Presented By Roy Ionas
35

Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.

Containment and Equivalence for an XPath

FragmentBy

Gerome Mikla

Dan Suciu

Presented By

Roy Ionas

Page 2: Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.

SEMINAR OBJECTIVESSEMINAR OBJECTIVES

• PRSENTING THE PROBLEM OF NON PRSENTING THE PROBLEM OF NON POLYNOMIAL COMPLEXITY FOR CONTAINMENT POLYNOMIAL COMPLEXITY FOR CONTAINMENT AND EQUIVALENCE OF XPath FRAGMENTS.AND EQUIVALENCE OF XPath FRAGMENTS.

• PRESENTING TWO ALGORITHMS THAT PRESENTING TWO ALGORITHMS THAT IMPROVE THE COST OF XPATH CONTAINMENT IMPROVE THE COST OF XPATH CONTAINMENT AND EQUIVALENCE PROBLEM.AND EQUIVALENCE PROBLEM.

• PRESENTING TREE PATTERNS AS AN PRESENTING TREE PATTERNS AS AN EFFECTIVE TOOL FOR PROVING IN XPATH EFFECTIVE TOOL FOR PROVING IN XPATH FRAGMENTS. FRAGMENTS.

Page 3: Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.

SO WHAT IS XPath?SO WHAT IS XPath?

• A simple language for A simple language for navigatingnavigating XML XML documents and selecting a set of nodesdocuments and selecting a set of nodes

• With XPATH we can query XML data , With XPATH we can query XML data , describe key constraints , express describe key constraints , express transformations and reference elements in transformations and reference elements in remote documents.remote documents.

• We can find XPath influence in other XML We can find XPath influence in other XML query languages and features such as XQuery query languages and features such as XQuery , XSLT , XML schema , XLink , XPointer and , XSLT , XML schema , XLink , XPointer and more... more...

Page 4: Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.

DEFINTIONSDEFINTIONS

• Simple XPath fragment.Simple XPath fragment.

• Containment between two XPath fragments.Containment between two XPath fragments.

• Equivalence between two XPath fragments.Equivalence between two XPath fragments.

• Computability definitions.Computability definitions.

• Tree patterns as a proving tool for XPath Tree patterns as a proving tool for XPath fragments.fragments.

Page 5: Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.

Simple XPath fragmentSimple XPath fragment

• An XPath statement.An XPath statement.

• Contains three most important features for Contains three most important features for navigating:navigating:– Child and descendant axis. “//” “/”Child and descendant axis. “//” “/”– Wildcards. “*”Wildcards. “*”– Qualifiers. “[]”Qualifiers. “[]”

• We disregard attributes , conditions...We disregard attributes , conditions...

• We identify and compare nodes only by their label.We identify and compare nodes only by their label.

• We disregard order completely.We disregard order completely.

• Example: a//*[b//d][c]Example: a//*[b//d][c]

Page 6: Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.

Simple XPath fragmentSimple XPath fragment

• Are these all the features we have in Are these all the features we have in XPath???XPath???

• Are these all the features we need for Are these all the features we need for representing navigation in XML representing navigation in XML documents ?documents ?

NO!!!!!

YES!!!!!

At least these are the needed ones for the proof of this article.

Page 7: Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.

ContainmentContainment

• The meaning of Containment between two The meaning of Containment between two XPath’s fragments A and B is that for every XML XPath’s fragments A and B is that for every XML document the result of applying XPath A will be document the result of applying XPath A will be contained in the result of applying XPath B.contained in the result of applying XPath B.

• Result is stated as a Set of nodes and does not Result is stated as a Set of nodes and does not consider order.consider order.

• Can we apply this containment on the entire Can we apply this containment on the entire XML documents world??XML documents world??

• Is there another way to determine containment Is there another way to determine containment between two XPath fragments???between two XPath fragments???

Page 8: Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.

EquivalenceEquivalence

• The meaning of Equivalence between two XPath The meaning of Equivalence between two XPath fragments A and B is that for every XML document fragments A and B is that for every XML document the result of applying XPath A will equal to the result the result of applying XPath A will equal to the result of applying XPath B.of applying XPath B.

• The problem of Equivalence can be reduced to the The problem of Equivalence can be reduced to the problem of Containmentproblem of Containment– Equivalence = containment in both ways between patterns.Equivalence = containment in both ways between patterns.– Containment can be computed with an algorithm that Containment can be computed with an algorithm that

computes equivalence and runs in polynomial time. computes equivalence and runs in polynomial time.

• From now we will mention only the problem of From now we will mention only the problem of containment and the results will be valid as well for containment and the results will be valid as well for equivalence.equivalence.

Page 9: Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.

Computability DefinitionsComputability Definitions

• NP - stands for “Nondeterministic-Polynomial". NP - stands for “Nondeterministic-Polynomial". • P class - A class of mathematical problems for P class - A class of mathematical problems for

which an efficient solution has been found , which which an efficient solution has been found , which is solvable in polynomial time.is solvable in polynomial time.

• NP class - A class of mathematical problems which NP class - A class of mathematical problems which most likely has most likely has Exponential ComplexityExponential Complexity, for which , for which no efficient solution has been found (yet), which is no efficient solution has been found (yet), which is not solvable in polynomial time. not solvable in polynomial time.

• NP hard problem - a problem that can be reduced NP hard problem - a problem that can be reduced from each NP problem ( even worst than NP… ).from each NP problem ( even worst than NP… ).

• NP complete problem – a problem which belongs NP complete problem – a problem which belongs to the NP class of problems and is a NP hard to the NP class of problems and is a NP hard problem by itself.problem by itself.

Page 10: Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.

Tree PatternsTree Patterns

• An unordered tree over the alphabet of the XPath.An unordered tree over the alphabet of the XPath.

• XPath nodes are marked as nodes in the tree XPath nodes are marked as nodes in the tree pattern.pattern.

• Child axis are marked as edges.Child axis are marked as edges.

• Descendant are marked as edges with double Descendant are marked as edges with double lines. lines.

• K-tuple of nodes called the result type.K-tuple of nodes called the result type.

• For a tree pattern P The arity of the result tuple For a tree pattern P The arity of the result tuple is called the of arity of P.is called the of arity of P.

• Pattern tree P is Boolean iff its arity is 0.Pattern tree P is Boolean iff its arity is 0.

Page 11: Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.

Tree PatternsTree Patterns

• Tree patterns are more elegant and Tree patterns are more elegant and general than XPath fragments.general than XPath fragments.

• We can reduce from XPath to Tree We can reduce from XPath to Tree Patterns and via versa quite easily.Patterns and via versa quite easily.

Now we can prove attributes using the graph theory.

Page 12: Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.

Tree Pattern - exampleTree Pattern - example

• For the Xpath expression :For the Xpath expression :– a//*[b//d][c] will be the next treea//*[b//d][c] will be the next tree

*

d

b

root

wildcard

descendant

child

a

c

Page 13: Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.

Usage of Tree Patterns for Usage of Tree Patterns for navigating in XML treesnavigating in XML trees

• Embedding from Tree pattern to XML tree.Embedding from Tree pattern to XML tree.

• Imagine it as a function that must:Imagine it as a function that must:– preserve root.preserve root.– Respects node labels.Respects node labels.– Respects edge relationships.Respects edge relationships.

• After embedding return the information from the After embedding return the information from the nodes marked as return nodes and down.nodes marked as return nodes and down.

• For Boolean Patterns return true if such an For Boolean Patterns return true if such an embedding exists.embedding exists.

Page 14: Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.

Example for embeddingExample for embedding

a

*

d

cb

a

s

t

cb

d

Page 15: Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.

PROBLEM….PROBLEM….

• Testing Containment between two XPath Testing Containment between two XPath fragments is a NP complete problem.fragments is a NP complete problem.

• Can be proven by a reduction from the Can be proven by a reduction from the 3CNF Co-NP class to our class.3CNF Co-NP class to our class.

Page 16: Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.

Do We really care about it???Do We really care about it???

• In almost all the applications we In almost all the applications we described so far.described so far.

• Inference of keys.Inference of keys.

• Optimization of XPath queries.Optimization of XPath queries.

When do we need to test for containment or equivalence between fragments?

Page 17: Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.

Solving the problemSolving the problem

• Finding an algorithm that will be both Finding an algorithm that will be both efficient and complete for this problem efficient and complete for this problem is quite difficult ( like proving P = NP ). is quite difficult ( like proving P = NP ).

• Finding an algorithm which is efficient Finding an algorithm which is efficient but not complete.but not complete.

• Finding an algorithm that is complete Finding an algorithm that is complete but not always efficient.but not always efficient.

Page 18: Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.

First solution : Pattern First solution : Pattern homomorphismhomomorphism

Page 19: Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.

Pattern Homomorphisms - Pattern Homomorphisms - definitiondefinition

• An homomorphism h between two tree patterns p,p’ is a An homomorphism h between two tree patterns p,p’ is a function h:Nodes(p) -> Nodes(p’) that maintains the function h:Nodes(p) -> Nodes(p’) that maintains the following conditions:following conditions:– Root preserving.Root preserving.– For each x in p h(x) in p’ is x or *.For each x in p h(x) in p’ is x or *.– Child and descendant relations preserving.Child and descendant relations preserving.

• Finding weather a homomorphism between two Finding weather a homomorphism between two patterns exist has many efficient algorithms.patterns exist has many efficient algorithms.

• The algorithm is sound. Whenever there exists The algorithm is sound. Whenever there exists homomorphism between tree patterns p and p’ than p homomorphism between tree patterns p and p’ than p p . p .

• The existence of homomorphism is always a The existence of homomorphism is always a sufficient condition for containment.sufficient condition for containment.

• But is it a necessary condition?But is it a necessary condition?

Page 20: Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.

Example for Example for homomorphismhomomorphism

a

b

a

c

*

h(a) = a

h(b) = *

Page 21: Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.

Homomorphism is not a Homomorphism is not a complete solution for complete solution for

containmentcontainment

• A Homomorphism between the two tree patterns A Homomorphism between the two tree patterns does not exist even though they are equivalent.does not exist even though they are equivalent.

a

b

*

a

b

*

Page 22: Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.

Cases where homomorphism Cases where homomorphism appliesapplies

• Fragments contain only *,[]Fragments contain only *,[]

• Fragments contain only //,[]Fragments contain only //,[]

• Fragments that contain all three Fragments that contain all three but can be translated to an but can be translated to an expression that belongs to one of expression that belongs to one of the above without changing the the above without changing the semantic. semantic.

Page 23: Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.

Conclusion for Conclusion for homomorphismhomomorphism

• Sound.Sound.

• Efficient.Efficient.

• Incomplete.Incomplete.

Now we aim searching over an algorithm which will be sound and complete and

may be efficient in several cases.

Page 24: Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.

ALGORITHM FOR ALGORITHM FOR CONTAINMENTCONTAINMENT

Page 25: Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.

Containment between regular Containment between regular languageslanguages

• Reducing the problem of containment Reducing the problem of containment between two XPath fragments to between two XPath fragments to containment between two regular containment between two regular languages by translating from Tree languages by translating from Tree Pattern to an automata.Pattern to an automata.

• The algorithm is complete , with The algorithm is complete , with defined rules we can translate defined rules we can translate completely from automata to Tree completely from automata to Tree Pattern and via versa.Pattern and via versa.

Page 26: Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.

Automata for XPath fragmentAutomata for XPath fragment

• Defined on ranked trees.Defined on ranked trees.

• Bottom up structure.Bottom up structure.

• Only the root is an accepting state.Only the root is an accepting state.

• The initial states are the leaves of the The initial states are the leaves of the tree.tree.

• The transitions are of the form:The transitions are of the form:(q1,q2,…,qn;a) -> q(q1,q2,…,qn;a) -> q

Page 27: Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.

definitionsdefinitions

• FTA - finite tree automata, an automata that FTA - finite tree automata, an automata that contains set of states and transitions of the form contains set of states and transitions of the form described.described.

• FTA can be deterministic - DFTA.FTA can be deterministic - DFTA.• Each FTA A with Q states can be translated to a Each FTA A with Q states can be translated to a

DFTA B with maximum of DFTA B with maximum of QQ states . states .• AFTA - alternating finite tree automaton extends AFTA - alternating finite tree automaton extends

the definition of FTA by adding “AND transitions” the definition of FTA by adding “AND transitions” of the form of the form (q1,q2,…,qm)->qi.(q1,q2,…,qm)->qi.

• A DFTA can be built as well for AFTA without A DFTA can be built as well for AFTA without increasing the cost of determinisiting the increasing the cost of determinisiting the automata. automata.

Page 28: Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.

The entire algorithmThe entire algorithm

• Construct the DFTA A accepting the Construct the DFTA A accepting the “regular expressions of P”“regular expressions of P”

• Construct the AFTA A’ accepting the Construct the AFTA A’ accepting the regular expressions of P’ ”regular expressions of P’ ”

• Compute the AFTA B=A x A’Compute the AFTA B=A x A’

• compute the DFTA C=Det(B)compute the DFTA C=Det(B)

• if lang(A) if lang(A) lang(C) the return true else lang(C) the return true else return false.return false.

Page 29: Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.

r

a

* b

ab

b

r

a

b *

?

Page 30: Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.

Step 1:Building FTA A from Tree Step 1:Building FTA A from Tree pattern ppattern p• States(A) = Nodes(p).States(A) = Nodes(p).

• For each node x with children x1,…,xk For each node x with children x1,…,xk we add a transition (x1,x2,…;x) -> xwe add a transition (x1,x2,…;x) -> x

• For each descendant edge e from node For each descendant edge e from node x to node y we add (y;e)->x.x to node y we add (y;e)->x.

we add internal circle (y,*) -> y we add internal circle (y,*) -> y

• The terminal state will be only the root.The terminal state will be only the root.

Page 31: Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.

Example for building FTAExample for building FTA

r

a

* b

ab

b

r

a

b*

ab

b

Page 32: Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.

Step 2:Building an AFTA A’ Step 2:Building an AFTA A’ from pattern p’from pattern p’

• States(A’) = Nodes(p’) States(A’) = Nodes(p’) Edges(p’) Edges(p’)

• (q,a) -> for every symbol a that has (q,a) -> for every symbol a that has out coming edge e. if it is a out coming edge e. if it is a descendant relationship than we also descendant relationship than we also add an internal circle to the source add an internal circle to the source node.node.

(e1,e2,e3..) -> a for every a that (e1,e2,e3..) -> a for every a that has incoming edges.has incoming edges.

Page 33: Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.

Example for building AFTA for Example for building AFTA for pattern p’pattern p’

r

a

b *

b *

a

r

Page 34: Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.

Conclusion for the containment Conclusion for the containment algorithmalgorithm

• SoundSound

• Complete.Complete.

• Not always efficient.Not always efficient.

Page 35: Containment and Equivalence for an XPath Fragment By Gerom e Mikla Dan Suciu Presented By Roy Ionas.