Top Banner
Annotated XML: Queries and Provenance Nate Foster TJ Green Val Tannen University of Pennsylvania Symposium on Database Provenance University of Edinburgh May 21, 2008
25

Annotated XML: Queries and Provenance Nate Foster TJ Green Val Tannen University of Pennsylvania Symposium on Database Provenance University of Edinburgh.

Mar 28, 2015

Download

Documents

Ariana Collins
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Annotated XML: Queries and Provenance Nate Foster TJ Green Val Tannen University of Pennsylvania Symposium on Database Provenance University of Edinburgh.

Annotated XML: Queries and Provenance

Nate Foster TJ Green Val Tannen University of Pennsylvania

Symposium on Database ProvenanceUniversity of Edinburgh

May 21, 2008

Page 2: Annotated XML: Queries and Provenance Nate Foster TJ Green Val Tannen University of Pennsylvania Symposium on Database Provenance University of Edinburgh.

Need to Track XML Provenance• For scientific data processing [Buneman+ 01]– Tree-structured data, heterogeneous sources – XML is the natural data model– Data annotated with source info; annotations need to

be propagated during query processing• For incomplete/probabilistic data [Sen.&Abit. 06]– Query output annotated with Boolean formulas– Annotations indicate correlations between source

data and output data• For data warehousing [Cui+ 00]– Even when data is relational, often have XML views

2

Page 3: Annotated XML: Queries and Provenance Nate Foster TJ Green Val Tannen University of Pennsylvania Symposium on Database Provenance University of Edinburgh.

Provenance for Relational Algebra Views

3

A B C

a b c

d b e

f g e

A Ba ca ed cd ef e

V := ¼AB((¼AC(R) ⋈ ¼C(R)) [ (¼AB(R) ⋈ ¼BC(R)))

source Rview V

??

?

Page 4: Annotated XML: Queries and Provenance Nate Foster TJ Green Val Tannen University of Pennsylvania Symposium on Database Provenance University of Edinburgh.

Semiring-Annotated Relations [PODS07]

• Associate each tuple in database with an annotation from a commutative semiring (K, +, ¢, 0, 1)

• Combine and propagate annotations during (positive) relational query processing–⋈, £, Å combine annotations using ¢–¼, [ combine annotations using +–¾ multiplies annotations by 0 or 1

4

Page 5: Annotated XML: Queries and Provenance Nate Foster TJ Green Val Tannen University of Pennsylvania Symposium on Database Provenance University of Edinburgh.

Annotated Relations Example

5

A B C

a b c p

d b e r

f g e s

RA Ba c 2p2

a e prd c prd e 2r2 + rsf e 2s2 + rs

V

V := ¼AB((¼AC(R) ⋈ ¼C(R)) [ (¼AB(R) ⋈ ¼BC(R)))

Page 6: Annotated XML: Queries and Provenance Nate Foster TJ Green Val Tannen University of Pennsylvania Symposium on Database Provenance University of Edinburgh.

Semiring Bestiary

• (B, Ç, Æ, ?, >) Set semantics• (N, +, ¢, 0, 1) Bag semantics• (PosBool(B), Ç, Æ, ?, >) Incomplete dbs• (P(), [, Å, ;, ) Probabilistic dbs• (P(P(X)), [, d, ;, {;}) Why-provenance where A

d B := {a [ b : a 2 A, b 2 B}• (C, min, max, absent, public) Security clearances• (N[X], +, ¢, 0, 1) Prov. polynomials

6

Page 7: Annotated XML: Queries and Provenance Nate Foster TJ Green Val Tannen University of Pennsylvania Symposium on Database Provenance University of Edinburgh.

Our Contribution: Annotated XML• We show how to decorate unordered XML data

with semiring annotations: K-UXML • We propagate the annotations for K-UXQuery

(based on a large fragment of positive XQuery)

• We do this by generalizing the semantics of Nested Relational Calculus (NRC) to handle annotated values and to incorporate a recursive tree type and structural recursion on trees

• We prove a commutation with homomorphisms theorem, and show that it enables applications in security and incomplete databases

7

Page 8: Annotated XML: Queries and Provenance Nate Foster TJ Green Val Tannen University of Pennsylvania Symposium on Database Provenance University of Edinburgh.

K-UXML

• No attributes, no text values, no repeated children (inessential); no order (essential!)

• Each node decorated with a value k from semiring K (1 “neutral,” 0 “not present”)

• K-collection: a finite set of elements annotated with values from K

• Formally, the children of a node form a K-collection of subtrees (to annotate root, also have a top-level K-collection)

8

Page 9: Annotated XML: Queries and Provenance Nate Foster TJ Green Val Tannen University of Pennsylvania Symposium on Database Provenance University of Edinburgh.

Example: XPath on K-UXML

9

a

bx1

cy3

cy1

a d

a

cy2 bx2

d

Source, $T:

r

cx1¢y3 + y1¢y2 cy1

d

a

cy2 bx2

Answer:

Query: element r { $T//c }

Omitted annotations are 1 (and omitted subtrees have annotation 0)

Page 10: Annotated XML: Queries and Provenance Nate Foster TJ Green Val Tannen University of Pennsylvania Symposium on Database Provenance University of Edinburgh.

Example: For-Loops in K-UXQuery

10

az

bx1 cx2

dy1 dy2 ey3

Source, $S: Answer:

Query: element p { for $t in $S return for $x in ($t)/¤ return ($x)/¤ }(i.e., element p { $S/¤/¤ })

p

d z¢x1¢y1 + z¢x2¢y2 e z¢x2¢y3

Page 11: Annotated XML: Queries and Provenance Nate Foster TJ Green Val Tannen University of Pennsylvania Symposium on Database Provenance University of Edinburgh.

Outline of Technical Approach

• Extend NRC with a recursive tree type– satisfies: tree = label £ { tree }

and an operation for structural recursion on trees (srt) [Robertson+ 07]– apply to each child subtree, collect results using

NRC big union• Generalize NRC + srt to handle semiring-

annotated complex values ) NRCK + srt• Define semantics of K-UXQuery by translation

to NRCK + srt11

Page 12: Annotated XML: Queries and Provenance Nate Foster TJ Green Val Tannen University of Pennsylvania Symposium on Database Provenance University of Edinburgh.

Semantics of Small Union

• Sums annotations«e1 [ e2¬K (x) := «e1¬K (x) + «e2¬K (x)

• Example:

12

ax

by

ax

by

ax

bz

,

Query: return ($S, $T) (in NRC: $S [ $T)

a2x

by

ax

bz

,

Source: Answer:

Page 13: Annotated XML: Queries and Provenance Nate Foster TJ Green Val Tannen University of Pennsylvania Symposium on Database Provenance University of Edinburgh.

Semantics of Big Union

• Sums and multiplies annotations

«[(x 2 e1) e2¬K (y) := «e1¬K (ai) ¢ «e2¬K[x := ai]

(y)

where the support (the set of elements with non-zero annotations) of «e1¬K is {a1, ..., an}

13

n

i 1

Page 14: Annotated XML: Queries and Provenance Nate Foster TJ Green Val Tannen University of Pennsylvania Symposium on Database Provenance University of Edinburgh.

Big Union Example With K = N

14

Query: return $T/¤/¤ (in NRC: [(x 2 $T) [(y 2 x) { y })

b2

c3

b b

c c cc c cc7

b

c

b

c

Source, $T : Answer:

´ ´c, c, c, c, c, c, c, , ,

Page 15: Annotated XML: Queries and Provenance Nate Foster TJ Green Val Tannen University of Pennsylvania Symposium on Database Provenance University of Edinburgh.

XPath Descendant Operator Uses srt

• //¤ applied to forest $T translates to

[(x 2 $T) ¼1((srt(b, s) . f) x)

where

f := let self = Tree(b, [(x 2 s) {¼2(x)} in

let matches = [(x 2 s) {¼1(x)} in

(matches [ {self}, self))• //a, similar to above

15

Page 16: Annotated XML: Queries and Provenance Nate Foster TJ Green Val Tannen University of Pennsylvania Symposium on Database Provenance University of Edinburgh.

• Data annotated with clearance levels fromtotal order C : P < C < S < T < 0

• Joint use of data (¢) requires access to both (max of clearances); alternative use of data (+) requires access to either (min of clearances)

• (C, min, max, 0, P) is a commutative semiring

p

d min(max(P,C,C),max(P,C,S)) e max(P,C,T)

Application: Security Clearances

16

p

d C e T

aP

bC cC

dC dS eT Query: element p { $S/¤/¤}

Page 17: Annotated XML: Queries and Provenance Nate Foster TJ Green Val Tannen University of Pennsylvania Symposium on Database Provenance University of Edinburgh.

• For any given clearance level (e.g., C), want the following diagram to commute:

Security Condition: Non-Interference

17

pP

dC eT

pP

dC

aP

bC cC

dC dS eT

aP

bC cC

dC

query

query

erase > C erase > C

Page 18: Annotated XML: Queries and Provenance Nate Foster TJ Green Val Tannen University of Pennsylvania Symposium on Database Provenance University of Edinburgh.

Application: Incomplete XML

• Data annotated with Boolean expressions; tree T represents set of possible worlds Mod(T)

18

T =

a

b

cy3

cy1

a d

a

cy2 b

da

b

c

c

a d

a

c b

d

Mod(T) =

a

b

a

d

a

b

c

a

d

a

b c

a d

a

b

d

, , ,...,

7 possible worlds

Page 19: Annotated XML: Queries and Provenance Nate Foster TJ Green Val Tannen University of Pennsylvania Symposium on Database Provenance University of Edinburgh.

Correctness: Possible Worlds

19

• For every incomplete tree T, and every UXQuery query q, want this diagram to commute:

T Mod(T)

q(Mod(T)) = Mod(q(T))q(T)

q q

Mod

Mod

Page 20: Annotated XML: Queries and Provenance Nate Foster TJ Green Val Tannen University of Pennsylvania Symposium on Database Provenance University of Edinburgh.

Commutation with Homomorphisms

• Theorem: Let h : K1 K2 be a semiring homo-morphism. Then for any UXQuery query q, and for any K1-UXML document D, we have h(q(D)) = q(h(D)).

• Ex: security clearanceshc : C C hc(k) := if k · c then k else 0

• Ex: incomplete dbsº : B B Evalº : PosBool(B) B

• Ex: duplicate elimination± : N B ±(k) := if k = 0 then ? else >

20

Page 21: Annotated XML: Queries and Provenance Nate Foster TJ Green Val Tannen University of Pennsylvania Symposium on Database Provenance University of Edinburgh.

Related Work

• Bag semantics for NRC [Libkin&Wong 97]

• Incomplete XML [Kanza+ 99, Abiteboul+ 06]

• Probabilistic XML [Nierman&Jagadish 02, van Keulen+ 05, Abit.&Senellart 06, Sen.&Abit. 07, Hung+ 07]

• XML provenance [Buneman+ 01]

• NRC provenance [Hidders+ 07]

• Semiring-annotated XPath [Grahne+ 07]

• Negation, expressiveness of RAK [Geerts&Poggi 08]

21

Page 22: Annotated XML: Queries and Provenance Nate Foster TJ Green Val Tannen University of Pennsylvania Symposium on Database Provenance University of Edinburgh.

Conclusion

• We showed how to annotate unordered XML trees (complex values) with values from a commutative semiring K, and propagate those annotations in queries for a large, positive fragment of XQuery (NRC + srt)

• We saw novel applications in security and incomplete dbs, made possible by a fundamental property of our framework, commutation with homomorphisms

22

Page 23: Annotated XML: Queries and Provenance Nate Foster TJ Green Val Tannen University of Pennsylvania Symposium on Database Provenance University of Edinburgh.

Future Work

• Practical applications based on framework– Security clearances– Jointly recording provenance, security,

multiplicities, uncertainty, etc. (product of semirings is also a semiring!)

• Query optimization: containment/equivalence wrt annotated semantics depends on K– In paper, we show K-equivalence for UXQuery is

the same as B-equivalence when K is a distributive lattice

23

Page 24: Annotated XML: Queries and Provenance Nate Foster TJ Green Val Tannen University of Pennsylvania Symposium on Database Provenance University of Edinburgh.

24

Page 25: Annotated XML: Queries and Provenance Nate Foster TJ Green Val Tannen University of Pennsylvania Symposium on Database Provenance University of Edinburgh.

K-UXQuery Syntax

25