Top Banner
2005 lav-iv 1 On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes tuples using views that cannot contribute to the rewriting, and then discards these tuples We show examples, and then how to address the problems
31

2005lav-iv1 On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 2005lav-iv1  On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes.

2005 lav-iv 1

On the Inverse rules algorithm

It is guaranteed to compute the certain answers

But, what about its efficiency?

As presented, it computes tuples using views that cannot contribute to the rewriting, and then discards these tuples

We show examples, and then how to address the problems

Page 2: 2005lav-iv1  On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes.

2005 lav-iv 2

Example :

A db: parenthood relation par(c, p)

A view: v(C, G) :- par(C, P), par(P, G) // only grandchildren

A query: Q: q(X, Y) :- par(X, Z), par(Z, Y) // find grandchildren

The algorithm inverts the view:

par(C, f(C, G)) , par ((f(C,G), G) -: v(C,G)

Given n tuples in the view, it produces 2n tuples, then joins, the discards the results that contain f(-,-)

The bucket algorithm will spend more time on rewriting, find:

Q’(X, Y) :- v(X, Y)

And then output the n results

Page 3: 2005lav-iv1  On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes.

2005 lav-iv 3

Example (university db) :

Views:

v1(s, c, q, t) :- registered(s, c, q), course(c, t), c>=500, q>=a98

v2(s, p, c, q) :- registered(s, c, q), teaches(p, c, q)

v3(s, c) :- registered(s, c, q), q<=a94

v4(p, c, t, q) :- registered(s, c, q), teaches(p, c, q), course(c, t), q<=a97

Query: q(s, p, c) :- registered(s, c, q), teaches(p, c, q), course(c, t), c>=300, q>=a95

Inverting v3: registered(s, c, f(s,c)) -: v3(s, c)

This may produce any number of facts for registered, but for this query none can be used – why?

Page 4: 2005lav-iv1  On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes.

2005 lav-iv 4

v3(s, c) :- registered(s, c, q), q<=a94

q(s, p, c) :- registered(s, c, q), teaches(p, c, q), course(c, t), c>=300, q>=a95

• How should the constraint on q in v3 be represented?

Could export it by f(s, c) <=a94 – then notice conflict with f(s, c) >= a95 in query (how is q in the query transformed to f(s,c)?)

But, what if the view contained no constraint?The view must export variables constrained in the query

• The query has a join on q with teaches; teaches facts are derived only from other views, so q will be exported as a different function symbol, or as q (which of these here?)

a join will fail (cannot join f1(-,-) with f2(-,-) or a regular variable)

The view must export join variables of the query

Page 5: 2005lav-iv1  On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes.

2005 lav-iv 5

The factors that determine usability of a view are the same as in the bucket algorithm, but the inverse rules algorithm tries to use all views anyway

Solution: compose query with inverse rules, to obtain a new query that uses directly the views

Composition:

Consider the heads of inverse rules as a db – collection of facts

Look for valuations – mapping of query variables that map query atoms to this db

Then repalce query goals by views

Page 6: 2005lav-iv1  On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes.

2005 lav-iv 6

Example : A db: parenthood relation par(c, p)

A view: v(C, G) :- par(C, P), par(P, G) // only grandchildren

A query: Q: q(X, Y) :- par(X, Z), par(Z, Y) // find grandchildren

The algorithm inverts the view:

par(C, f(C, G)) , par ((f(C,G), G) -: v(C,G)

Two candidate valuation mappings:

X C, Z f(C,G), Y G q(C, G) :- v(C, G), v(C, G)

X f(C, G), Z ,G, Y f(C, G) (assuming we add C=G)

q(f(G, G), f(G,G)) :- v(G, G), v(G, G)

2nd is discarded – no function symbols in result

Minimization of 1st gives q(C, G) :- v(C, G), same as bucket

‘db’

Page 7: 2005lav-iv1  On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes.

2005 lav-iv 7

q(s, p, c) :- registered(s, c, q), teaches(p, c, q), course(c, t), c>=300, q>=a95

registered(s, c, f(s, c)), f(s, c)<=a94 :- v3(s, c)

Any valuation that uses this fact must map q f(s, c)

• The constraint f(s, c) <= a94 conflicts with f(s,c)>=a95,

but what if there is no constraint to export?

• The mapping q f(s, c) cannot be used to map teaches to any fact derived from other views

v3 cannot be used

Page 8: 2005lav-iv1  On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes.

2005 lav-iv 8

A mapping will fail to define a valuation if

• a view does not export a join variable, and does not contain the join (why?)

• The view does not export a variable that is constrained in the query (cannot ‘check’ the constraint in the ‘db’)

Thus, the results (for a CQ query, possibly with constraints) will be the same as for bucket (assuming it is correct & complete)

The amount of work invested will probably be similar

Composition can be performed also for Datalog queries, but weeding out useless mappings is more difficult

Page 9: 2005lav-iv1  On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes.

2005 lav-iv 9

The MiniCon algorithm --- the final one?

Motivation Preliminaries The MiniCon algorithm

Page 10: 2005lav-iv1  On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes.

2005 lav-iv 10

Motivation

Previous algorithms: bucket, inverse rules, may be quite expensive to use, especially for systems with many views.

The bucket algorithm has a narrow peephole in 1st stage – each bucket is for a single atom

global constraints are treated only in 2nd stage

Many useless combinations may be examined

The inverse rules algorithm improved by composition, seems to perform similar work

The motivation: find an algorithm that will do more work in preliminary filtering, and will scale up to hundreds of views

Page 11: 2005lav-iv1  On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes.

2005 lav-iv 11

Preliminaries

The idea

• Once a view is put in a bucket of a query atom, switch to considering join variables – and find which other atoms are necessarily covered by the view

• Along the way, find out also which view head variables need to be equated

• Given coverage by views, combine views with disjoint covers

Expected gain: • more filtering in the 1st stage,

• better representation of information

A smaller number of combinations, reduced number of containment checks in the 2nd stage

Page 12: 2005lav-iv1  On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes.

2005 lav-iv 12

Example :

A db: parenthood relation par(c, p)

A view: v(C, G) :- par(C, P), par(P, G) // only grandchildren

A query: Q: q(X, Y) :- par(X, Z), par(Z, Y)

Bucket : one view in each bucket

par(X, Z): {v(X,G)} par(Z, Y): {v(P, Y)}

When the two view atoms are combined, a containment check discovers that G=Y containment, & redundancy of 2nd atom

Alternative: given par(X, Z): v(X,G), since Z (join var) occurs in 2nd atom of query, add par(Z, Y) to coverage of v(X,G), with G=Y

In 2nd stage, just use v(X, Y)

Page 13: 2005lav-iv1  On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes.

2005 lav-iv 13

Assumptions, terminology:

• CQ queries and views, for now: no constants / constraints in query/views

• View definitions use variables different from those in query or other views (disjoint sets of variables)

• b(Q) – body atoms of Q, b(V) – body atoms of view V

• A mapping from vars(Q) to a vars(V) is interesting only if it maps a non-empty subset of b(Q) to b(V)

• Considered mappings always map Q head vars to V head vars – head var preservation – (hvp)

• If h maps x in vars(Q) to an existential var in some V, then all atoms of b(Q) that contain x must be mapped to same V:

join variable condition --- (jvc)

Page 14: 2005lav-iv1  On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes.

2005 lav-iv 14

Given Q(X), assume Q’ is a rewriting in terms of views

Q’: q(X) :- v1(X1), …, vn(Xn) (some vi, vj may be occurrences of same view v)

Exists containment mapping h from Q to exp(Q’) (satisfies hvp)

Let

• Gi be the set of atoms of b(Q) mapped to b(exp(vi))

• h/i – h restricted to vars(Gi)

Then

And Gi satisfies (jvc):

if h/i maps x of vars(Gi) to existential variable of vi,

then every atom g in b(Q) that contains this atom is in Gi

Gi Gj = , Gi = b(Q)Ç Æ U

Page 15: 2005lav-iv1  On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes.

2005 lav-iv 15

The occurrence of vi in Q’ may have some head variables equated

Example :

the original head might be vi(A, B, C)

the head in Q’ : vi(X, X, Z)

These equalities are given by a unique least set of equality constraints Ei

(v/E -- the view v, with head variables equated as specified by E)

Summary (so far): the containment mapping can be decomposed into “disjoint” components (vi, Ei, h/i , Gi)

All we need to do is find such components, then combine them

What is the condition for successful combination?

Does a combination (s.t. ) ever fail ? Gi Gj = , Gi = b(Q)Ç Æ U

Page 16: 2005lav-iv1  On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes.

2005 lav-iv 16

To find such components, we must use the given view definitions (variables different from those of Q or exp(Q’)).

Answer : a component and its mapping can be expressed as:

Here: • hi is a mapping from Q to the given view definition for vi

• E’i – the least set of equalities that make hi a good mapping

• h’i is a variable renaming

E’i and hi depend only on Q and the definition of vi

We can find components mappings from Q to the view defs, then combine & rename, possibly equating more head vars

Gi

vi/E’i

exp(vi(Xi))

hi

h/i

h’i

Page 17: 2005lav-iv1  On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes.

2005 lav-iv 17

One more step :

A component (vi, Ei, hi , Gi) may be further decomposed into smaller components (vi, Ei1, hi1 , Gi1), (vi, Ei2, hi2 , Gi2) provided

• each of Gi1, Gi2 satisfies (jvc), and they are disjoint

• Each of Ei1, Ei2 is a subset of Ei, least sets for the mappings hi1, hi2 to be ok

When these are combined, Ei1 union Ei2 is augmented with the remaining equalities of Ei

Minimal such components:

• Easier to find

• Can be re-used for different combinations.

Page 18: 2005lav-iv1  On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes.

2005 lav-iv 18

What is a minimal component?

C = (vi, Ei, hi, Gi) is minimal if

• hi satisfies (hvp) + (jvc) (assuming the equalities in Ei)

• There is no component C1 whose last three components are contained in C’s last three components (at least one is proper containment)

A component: minicon (mini containment) description -- MCD

The algorithm constructs and combines minimal MCDs

Page 19: 2005lav-iv1  On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes.

2005 lav-iv 19

The MiniCon Algorithm

Minimal MCD Construction Algorithm :

For each g in b(Q), each k in each b(vi)

Let E(g,k) be the least set of equalities s.t. a mapping h(g,k) from g to k that satisfies (hvp) exists

// E(g,k) and h(g,k), if they exist,

// are uniquely determined by g, k

If E(g,k) and h(g,k) exist find all minimal MCDs that extend them: (vi, Ei, hi, Gi) extends if Ei contains E(g,k), hi contains h(g,k), Gi contains g

For the final set of MCDs remove duplicates

Page 20: 2005lav-iv1  On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes.

2005 lav-iv 20

How do we find minimal MCDs that extend a given mapping?

I. Extension to one more query atom, one view atom

extend (vi, E, h, g, k) // E equalities on head vars of vi

// h: vars(Q) vars(vi), partial, hvp with E

// g in b(Q), k in b(vi)

try to extend h to map g to k, with hvp, by adding equalities to E

return fail, or the (uniquely determined) E’,h’

(The first step in alg. of previous page is this one, given empty E and h)

Page 21: 2005lav-iv1  On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes.

2005 lav-iv 21

How do we find minimal MCDs that extend a given mapping?

II. Extend repeatedly, as long as needed and successful

Given vi, g, k , E(g,k) and h(g,k) :

Let C = {(vi, E(g,k), h(g,k), {g}}, MC = {} //C – initial component, (jvc) possibly not satisfied

While C not empty – remove some c = (vi, E, h, G) from C

– if (jvc) satisifed – put in MC

– if not, exists x in vars(Q) s.t. h(x) is existential, g’ that contains x, g’ not in G

– for each k’ in b(vi)

if extend(vi, E, h, g’, k’) succeeds, put extension in C

Remove duplicates from MC

Page 22: 2005lav-iv1  On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes.

2005 lav-iv 22

Example :

A db: parenthood relation par(c, p)

A view: v(C, G) :- par(C, P), par(P, G) // only grandchildren

A query: Q: q(X, Y) :- par(X, Z), par(Z, Y)

MCDs:

• 1st query atom, 1st view atom: h(1,1) = {XC, Z P}, E(1.1) ={}

need to extend to par(Z, Y), can only map to 2nd view atom

MCD: (v, E={}, h={XC, ZP, YG}, b(Q))

• 1st query atom, 2nd view atom: no mapping

The only MCD is the above

Page 23: 2005lav-iv1  On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes.

2005 lav-iv 23

Comment :

In the paper, if (vi, Ei1, hi1, Gi1) and (vi, Ei2, hi2, Gi2) are both minimal extensions, and Gi1 is contained in Gi2, then the 2nd is thrown away (another minimization)

I do not know how to explain this optimization, or prove that with it the algorithm is still complete

Page 24: 2005lav-iv1  On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes.

2005 lav-iv 24

2nd phase: MCD combination, and variable renaming :

A set of MCDs {(vi, Ei, hi, Gi)} is a candidate if

For each candidate set:

Rename variables : for each view variable y :

If hi(x) = y (y a view variable), rename y to x

else rename y to a fresh distinct variable

Note : if x in domain of both hi, hj , then hi(x), hj(x) are head variables of vi, vj (by def of MCD),

renaming makes them equal

Gi Gj = , Gi = b(Q)Ç Æ U

Page 25: 2005lav-iv1  On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes.

2005 lav-iv 25

Example (cont’d):

A db: parenthood relation par(c, p)

A view: v(C, G) :- par(C, P), par(P, G) // only grandchildren

A query: Q: q(X, Y) :- par(X, Z), par(Z, Y)

MCD: (v, E={}, h={XC, ZP, YG}, b(Q))

Rename in v C to X, G to Y

Rewriting: q(X, Y) :- v(X, Y)

Page 26: 2005lav-iv1  On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes.

2005 lav-iv 26

Example :

A db: parenthood relation par(c, p)

A view: v(C, G) :- par(C, P), par(P, G) // only grandchildren

A query: Q: q(X, X) :- par(X, Z), par(Z, X) // I am my own grandpa

MCDs:

• 1st query atom, 1st view atom: h(1,1) = {XC, Z P}, E(1.1) ={}

need to extend to par(Z, X), can only map to 2nd view atom

MCD: (v, {C=G}, {XC, ZP}, b(Q))

• 1st query atom, 2nd view atom: no mapping

The only MCD is the above

Page 27: 2005lav-iv1  On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes.

2005 lav-iv 27

Example :

A db: parenthood relation par(c, p)

A view: v(C, P) :- par(C, P), par(P, G) // parents where grandparents exist

A query: Q: q(X, Y) :- par(X, Z), par(Z, Y)

MCDs:

• h(1,1) = {X C, Z P}, E(1.1) ={}

MCD A1 = ( v(C, P), {}, h(1,1), {par(X,Z)} )

• h(1, 2) = {X P, Z G}, E(1,2)={}, fails (why?)

• h(2, 1) = {Z C, Y P}, E(2,1)={}

MCD A2 = ( v(C, P), {}, h(2,1), {}, {par(Z,Y)} )

• h(2, 2) = {Z P, Y G}, fails (why?)

Page 28: 2005lav-iv1  On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes.

2005 lav-iv 28

A view: v(C, P) :- par(C, P), par(P, G)

A query: Q: q(X, Y) :- par(X, Z), par(Z, Y)

MCDs:

A1 = ( v(C, P), {}, h(1,1), {par(X,Z)} )

A2 = ( v(C, P), {}, h(2,1), {par(Z,Y)} )

Rewritings: (rename views to have distinct vars)

A1+A2: X C1, Z P1, Z C2, Y P2 : add P1 (in 1st v) = C2 (in 2nd v)

rewriting v(C1,P1), v(P1, P2)

renaming: v(X, Z), v(Z, Y) – a correct rewriting

Page 29: 2005lav-iv1  On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes.

2005 lav-iv 29

When Q or views contain constants:

MCD formation:

• a of Q must be mapped to a head variable of vi, or itself

• If x is in headvar(Q), it can be mapped to headvar(vi) or to a

• Whenever x is mapped to a, hi records this fact

MCD combination:

If A1, A2 are defined on x, then allow also

• Both map x to a

• One maps x to a, the other to head var of view

• In either case, rename x to a in rewriting

Page 30: 2005lav-iv1  On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes.

2005 lav-iv 30

When Q or views contain comparisons:

• If views contain comparisons, no change to algorithm (it finds contained rewritings anyway)

• If Q contains comparisons, then there may be no Datalog program that computes the certain answers (can express x != y)

But, we can expect that extending the algorithm for comparisons will be a good heuristics, and will find certain answers in many cases

Page 31: 2005lav-iv1  On the Inverse rules algorithm It is guaranteed to compute the certain answers But, what about its efficiency? As presented, it computes.

2005 lav-iv 31

When Q or views contain comparisons:

C(Q) – constraints of Q (closed under inference)

MCD formation: (vi, Ei, hi, Gi) (extend the join variable condition)

• If hi(x) is existential of vi, and c(x, y) in C(Q), then hi(y) is defined

• C(vi) must imply all constraints in hi(C(Q)) that involve at least one existential of vi

MCD combination:

Add all constraints of C(Q) not covered by those of the views