Top Banner
1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara
30

1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.

Dec 22, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.

1

Query Planning with Limited Source Capabilities

Chen Li Stanford University

Edward Y. ChangUniversity of California, Santa Barbara

Page 2: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.

2

• Heterogeneous information sources on the WWW

• Information-integration systems

• Limited query capabilities

– Music stores: amazon.com, cdnow.com.– Must specify a value of Artist or Title.

– The sources do not answer queries such as “Give me all your information about CDs.”

Motivation

Page 3: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.

3

Sources View Schemas Must Bind

1 v1(Song, CD) Song2 v2(CD, Artist, Price) CD3 v3(CD, Artist, Price) Artist

Query: “Find the prices of CDs containing a song titled Friends.”

Example

v1(Friends, CD) v2(CD, Artist, Price)v1(Friends, CD) v3(CD, Artist, Price)

Page 4: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.

4

Source tuples

v1(Song, CD)

<Friends, Love>

<Friends, Life>

v2(CD, Artist, Price)

<Love, Lucy, $15><Story, Snoopy, $14>

v3(CD, Artist, Price)

<Story, Lucy, $13>

<Love, Snoopy, $10>

<Life, Charlie, $8>

Not all the tuples couldbe retrieved from thesources due to the restrictions.

Page 5: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.

5

Traditional approach: consider each join at a time.

v1 v2: {$15}

v1 v3: empty, no binding for Artist.

v1(Song, CD)

<Friends, Love>

<Friends, Life>

v2(CD, Artist, Price)

<Love, Lucy, $15><Story, Snoopy, $14>

v3(CD, Artist, Price)

<Story, Lucy, $13>

<Love, Snoopy, $10>

<Life, Charlie, $8>

Page 6: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.

6

Our approach: retrieve as many tuples as possible.

X

X

X

X

X

XThis approach could savethe user $15 - $10 = $5!

v1(Song, CD)

<Friends, Love>

<Friends, Life>

v2(CD, Artist, Price)

<Love, Lucy, $15><Story, Snoopy, $14>

v3(CD, Artist, Price)

<Story, Lucy, $13>

<Love, Snoopy, $10>

<Life, Charlie, $8>

v1 v2: {$15}v1 v3: {$10}

Page 7: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.

7

• Access views not in a join to retrieve bindings;• Recursive process;• Some tuples in the answer cannot be retrieved.

X

X

X

X

X

X

v1(Song, CD)

<Friends, Love>

<Friends, Life>

v2(CD, Artist, Price)

<Love, Lucy, $15><Happy, Snoopy, $14>

v3(CD, Artist, Price)

<Happy, Lucy, $13>

<Love, Snoopy,$10>

<Life, Charlie, $8>

Observations

Page 8: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.

8

• How to compute the maximal answer?• When should we access sources not in a query?• What sources should be accessed?

Questions

Page 9: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.

9

Source views

• A set of source views V with binding patterns:– b: a value must be specified for the attribute– f: free

• Each view schema uses a set of global attributes

CD Artist PriceSong

b fv1(Song, CD)

b f fv2(CD, Artist, Price)

f b fv3(CD, Artist, Price)

Hypergraph representation:

Page 10: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.

10

A query Q includes:– Input attributes: I;

– Output attributes: O.

Queries

Input attribute: {Song}Output attribute: {Price}

CD Artist PriceSong

v1(Song, CD)

v2(CD, Artist, Price)

v3(CD, Artist, Price)

Page 11: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.

11

• Connection: a set of views that connect I and O in Q.

• Meaning: natural join of the views.

• Universal-relation-like assumptions, but connections can be generated in various ways.

Connections

T1={v1,v2}, T2={v1,v3}

CD Artist PriceSong

v1(Song, CD)

v2(CD, Artist, Price)

v3(CD, Artist, Price)

Page 12: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.

12

Question 1: Computing the maximal answer

• Translate a query and source views into a Datalog program.

• Borrowed the idea from Duschka and Levy [IJCAI-97]. – We eliminate useless source accesses.

• Why Datalog programs? Recursion.

Page 13: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.

13

Constructing program (Q,V)Connection rules: ans(P) :- V1(s1, C) & V2 (C, A, P) ans(P) :- V1(s1, C) & V3 (C, A, P)Fact rule: song(s1) :-

}v1(Song, CD)-rule: V1(S, C) :- song(S) & v1(S,C)Domain rule: cd(C) :- song(S) & v1(S, C)

}v2(CD, Artist, Price)

}v3(CD, Artist , Price)

V2(C, A, P) :- cd(C) & v2(C, A, P)artist(A) :- cd(C) & v2(C, A, P)price(P) :- cd(C) & v2(C, A, P)V3(C, A, P) :- artist(A) & v3(C, A, P)cd(C) :- artist(A) & v3(C, A, P)price(P) :- artist(A) & v3(C, A, P)

Page 14: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.

14

• Binding assumptions:

– A binding for an attribute is from the attribute’s domain;

– Do not allow the “strategy” of trying all the possible strings to “test” the source (may not terminate);

– Any binding is either obtained from the query, or from a tuple returned by a source query.

• The program (Q,V) computes the maximal answer.

Page 15: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.

15

A

B

C D

E F

f f bv2(A, B, C)

b fv3(C, D)

b fv1(A, C)

b fv5(E, F)

f fv4(C, E)

Query: Input: A = a1

Output: D = ?Connections: T1 = {v1,v3}, T2 = {v2,v3}

Not all the views need to accessed.

Question 2: when to access off-query sources?

Page 16: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.

16

• T1: accessing outside T1 sources is NOT necessary.

A C v3(C, D)v1(A, C) D

• T2: accessing outside T2 sources is necessary to get

C bindings.

AB

C D

v2(A, B, C)

v3(C, D)

Page 17: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.

17

Independent connections• A connection T is independent if all the views in T can be

queried starting from the input attributes as the initial bindings and using only the views in T.

• T2 is not independent, it needs C bindings.

AB

C D

v2(A, B, C)

v3(C, D)

• T1 is independent. A C v3(C, D)v1(A, C) D

• Theorem: off-connection source accesses are only necessary for nonindependent connections.

Page 18: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.

18

• A view v is relevant to connection T if we may miss some answers to T when v is not used.

A

B

C D

E F

v2(A, B, C)

v3(C, D)v1(A, C)

v5(E, F)v4(C, E)

• The relevant views of T2 are: v2, v3 , v1, v4 .

• How to find all the relevant views of a nonindependent connection?

Question 3: what sources should be accessed?

Page 19: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.

19

Kernel• A kernel of a connection is a minimal set of attributes that

need to be initially bound in addition to the input attributes to query the full connection.

• A connection may have multiple kernels.

• T1 has one kernel: {} A C v3(C, D)v1(A, C) D

• T2 has one kernel: {C}

AB

C D

v2(A, B, C)

v3(C, D)

Page 20: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.

20

Algorithm FIND_REL: Finding relevant views of a connection

Find all the relevant views of connection T2 = {v2,v3}:

A

B

C D

E F

v2(A, B, C)

v3(C, D)v1(A, C)

v5(E, F)v4(C, E)

(1) Compute queryable views: {v1,v2 ,v3,v4,v5};(2) Find a kernel K of T2 : K = {C};

(4) Return R T2 = {v1,v2 ,v3 ,v4}.

(3) Compute all the views that can help produce bindings for the attributes in K: R = {v1,v2 ,v4} ;

Page 21: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.

21

Constructing an efficient program

• Compute the relevant views for each connection; • Take the union of all these relevant source views;• Use these views to construct a new program;• Remove useless rules.

Page 22: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.

22

Conclusions

• A query-planning framework to compute the maximal answer to a query (Duschka and Levy [IJCAI-97]).

• Techniques for telling when to access off-query views;

• Algorithms:– finding all the relevant sources for a query;

– constructing an efficient program.

Page 23: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.

23

Other related work

• Rajaraman, Sagiv, and Ullman [PODS-95]: – Shows how to find an equivalent query rewriting using views with

binding restrictions;

– We give the maximal rewriting of a query.

• Optimizing conjunctive queries with binding restrictions:– Yerneni, Li, Garcia-Molina, and Ullman [ICDT-99];

– Florescu et al. [SIGMOD-99].

• Testing connection containment:– Li [Stanford-CS-TR 2000], using results of monadic programs to

prove the problem is decidable.

Page 24: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.

24

Predicates

EDB predicates IDB predicatesv1(S, C) V1 (S, C)v2(C, A,P) V2 (C, A, P)v3(C, A, P) V3 (C, A, P)

cd(C)song(S)artist(A)price(P)

ans(P)

}-predicates

}domain predicates

Page 25: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.

25

Evaluating program (Q,V)

• Assume the right side of an -rule or a domain rule is:

domA1(A1), …, domAp(Ap), vi(A1,…, Am)

• Once we have bindings for domA1(A1), …, domAp(Ap), evaluate the rule and populate the domain predicates and -predicate.

• Repeat until no more facts can be derived.• Compute the maximal answer to the query.

Page 26: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.

26

Forward-closureGiven views W V, and attributes X, the forward-closure of X given W, denoted f-closure(X,W), is the the set of views in W that can be eventually queried by using the views in W, starting from the initial bindings X.

f-closure({A},{v1,v2,v3}) = {v1,v2,v3}

f-closure({D},{v1,v2,v3}) = {}

A

B

C D

E F

v2(A, B, C)

v3(C, D)v1(A, C)

v5(E, F)v4(C, E)

Page 27: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.

27

• Backward-closure of a set of attributes X: b-closure(X), is the set of views that can help retrieve bindings for X.

Backward-closure

• Lemma: All backward-closures of a connection are the same.

b-closure(C) = {v1,v2,v4}

A

B

C D

E F

v2(A, B, C)

v3(C, D)v1(A, C)

v5(E, F)v4(C, E)

Page 28: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.

28

• BF-chain:

• Backward-closure:

BF-chain, backward-closure

free

bound bound bound

freefree

A

B

C D

E F

v2(A, B, C)

v3(C, D)v1(A, C)

v5(E, F)v4(C, E)

b-closure(C) = {v1,v2,v4}

Page 29: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.

29

Other possibilities of obtaining bindings

• Cached data: For a cached tuple ti(a1,a2) for view vi(A1,A2), add the following rules to the program (Q, V):

vi(a1,a2) :-

domA1(a1) :-

domA2(a2) :-

• Domain knowledge: – student(name, dept, GPA).

– dept = CS, Physics, Chemistry, etc.

Page 30: 1 Query Planning with Limited Source Capabilities Chen Li Stanford University Edward Y. Chang University of California, Santa Barbara.

30

Computing a partial answer

• Independent connections: complete answers are computable.

• Nonindependent connections: access some relevant views. May terminate evaluating the program after some results are computed.