Top Banner
1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding
36

1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

Dec 15, 2015

Download

Documents

Davion Dear
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

1

Global-as-View and Local-as-Viewfor Information Integration

CS652 Spring 2004

Presenter: Yihong Ding

Page 2: 1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

2

Common Integration Architecture

• Information Integration Systems

• Global-as-view (Gav.) vs. Local-as-view (Lav.)

• Query Reformulation• Specification of Source

Description• Adding new sources

Page 3: 1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

3

Query Reformulation

• Problem: rewrite a user query expressed in the mediated schema into a query expressed in the source schema

Given a query Q in terms of the mediator schema relations, and descriptions of information sources

Find a query Q’ that uses only the source relations, such that– Q’ Q, and– Q’ provides all possible answers to Q given the sources

Page 4: 1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

4

Solving Queries by Views

Mediator Relations

Source Relations

Page 5: 1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

5

Query Rewriting Using Views

• Query Containment: q’ q D q’(D) q(D)• Query Equivalence: q’=q q’ q ^ q q’Given query q and view definitions V={v1, …, vn}• q’ is an Equivalent Rewriting of q using V if

– q’ refers only to views in V, and– q’ = q

• q’ is an Maximally-Contained Rewriting of q using V if – q’ refers only to views in V and– q’ q, and– There is no rewriting q1, such that q’ q1 and q1q’

Page 6: 1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

6

ComputationComplexity

p

k

p

k

pk

p

k

p

k 1

Page 7: 1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

7

Complexity of Query Containment

• Conjunctive Queries (CQ) (NP-Complete) – Q1: p(X,Z) :- a(X,Y) & a(Y,Z)– Q2: p(X,Z) :- a(X,Y) & a(V,Z)

• CQ’s With Negation ( -Complete)– Q1: p(X,Z) :- a(X,Y) & a(Y,Z) & NOT a(X,Z)

• CQ’s With Arithmetic Comparison ( -Complete)– Q1: p(X,Z) :- a(X,Y) & a(Y,Z) & X<Y

• Datalog Programs– p(A,C) :- a(A,B) & b(B,C)

p

2

p

2

Page 8: 1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

8

Specification of Source Description

• Views: resources that used by integrator to help to answer queries

• Gav. Mediator relation defined as view over source relations

• Lav. Source relation defined as view over mediator relations

Page 9: 1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

9

Information Integration Systems

• Tsimmis– Stanford and IBM– Global-as-View (Gav)– Mediator relations defined as views of source relations

• Information Manifold (IM)– AT&T– Local-as-View (Lav)– Description logic– Source relations defined as views of mediator relations ( a

collection of global predictions)

Page 10: 1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

10

TSIMMIS – Gav Solution

• The Stanford-IBM Manager of Multiple Information Sources (TSIMMIS)

• Offers:– A flexible data model– A common query language– Other supporting tools

Page 11: 1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

11

TSIMMIS – Components

• OEM (Object-Exchange Model)

• LOREL (Lightweight Object REpository Language)

• MSL (Mediator Specification Language)

• Wrappers

Page 12: 1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

12

TSIMMIS – OEM

• Object Exchange Model• The data model for TSIMMIS• “self-describing” (labels carry all of the

information that there is about an object)• Flexible• First order logic

Page 13: 1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

13

TSIMMIS – OEM

OID: label type value

Object Identifier

Human Understandable

“set” or “string”

A set or a string

Page 14: 1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

14

TSIMMIS – OEM

library

set

book set

author

string

title string

Aho

Compilers…

Page 15: 1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

15

TSIMMIS – OEMFirst order predicate logic

author

string

Aho123

author( T, “Aho” )

This would return the object IDs of allobjects with a label “author” and value “Aho”.

Page 16: 1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

16

TSIMMIS – LOREL

• Lightweight Object REpository Language• An OQL for OEM• The end-user language for TSIMMIS

Page 17: 1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

17

TSIMMIS – LOREL

• Example

select library.book.titlefrom librarywhere library.book.author = “Aho”

Page 18: 1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

18

TSIMMIS – LOREL

• Partial Match Semantics

select R.Afrom R, S, Twhere R.A = S.A or R.A = T.A

• This would fail to return anything in SQL if either S or T were empty.

• Because of partial match semantics this does not fail in LOREL

Page 19: 1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

19

TSIMMIS – MSL

• Mediator Specification Language• Allows declarative specification of mediators• Object oriented, logical query language• Targeted to OEM

Page 20: 1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

20

TSIMMIS – MSL Query

Mediator

Mediator

WrapperWrapper

SourceSource

<booktitle X> :- <library { <book { <title X> <author “Aho”> } > } > @s1

library

set

book set

author

string

Aho

title string

Compilers…

Page 21: 1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

21

TSIMMIS – Wrappers Query

Mediator

Mediator

WrapperWrapper

SourceSource

• Wrappers are similar to database drivers

• Wrappers are written with MSL

Page 22: 1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

22

TSIMMIS – Wrappers

• Wrappers have the form:

MSL template// action //

• Example:

<books X> :- <library { X:<book {<title X> <author $AU>}> }>@s1// sprintf(lookup-query, “find author %s”, $AU) //

Page 23: 1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

23

TSIMMIS – Summary

• End users need to specify their sources w.r.t. a mediator model – OEM in TSIMMIS

• Query specification is standard – LOREL • Query rewriting is straightforward – MSL and

wrappers • To add a new source is not easy – need to

specify it in the mediator model

Page 24: 1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

24

Information Manifold

• Challenges for Information Integration– Interrelated data over

multiple information sources

– Large number of the sources

– Limited size of data in many of the sources

– Greatly variant details of interacting with each source

Page 25: 1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

25

IM Architecture

1

2 3

Bucket algorithm

Page 26: 1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

26

World View

Product(Model)Automobile(Model, Year, Category)Motorcycle(Model, Year)Car(Model, Year, Category) NewCar(Model, Year, Category)UsedCar(Model, Year, Category) CarForSale(Model, Year, Category, Price, SellerContact)

Automobile

Car Motorcycle

Car

UsedCar CarForSale

Product

Automobile

Virtual Relations:

Classes:

NewCar

Page 27: 1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

27

Source Descriptions

For each source:

• Content Record • Capability Record

Web Sources forAutomobile Application

Page 28: 1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

28

Content Records of Auto Sources

Page 29: 1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

29

Capability Records of Auto Sources

desired input set possible output set

capable selection set

Page 30: 1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

30

Query Reformulation

• Containing instead of equivalent– Incomplete source – Useful subset

• Utilizes Plan Generator to:– Prune irrelevant sources– Split query into subgoals– Generate conjunctive query plans– Find executable ordering of subgoals

Page 31: 1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

31

The Bucket Algorithm

Given: user query q, source descriptions {Vi}

1. Find relevant source (fill buckets) For each relation g in query q

• Find Vj that contains relation g

• Check that constraints in Vj are compatible with q

2. Combine source relations {Vj} from each bucket into a conjunctive query q’ and check for containment (q’ q)

Page 32: 1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

32

The Bucket Algorithm: Example

q(m,p,r) CarForSale(c), Category(c,sportscar), Year(c,y), y1992, Model(c,m), Price(c,p), ProductReview(m,y,r)

q(m,p,r) CarForSale(c), Category(c,sportscar), Year(c,y), y1992, Model(c,m), Price(c,p), ProductReview(m,y,r)

Page 33: 1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

33

1. Filling the Buckets

q(m,p,r) CarForSale(c), Category(c,sportscar), Year(c,y), y1992,

Model(c,m), Price(c,p), ProductReview(m,y,r)

q(m,p,r) CarForSale(c), Category(c,sportscar), Year(c,y), y1992,

Model(c,m), Price(c,p), ProductReview(m,y,r)

V1(c1)

V2(c2)

V3(c3)

V1(c1,t1)

V2(c2,t2)

V3(c3,t3)

V1(c1,y1)

V2(c2,y2)

V3(c3,y3)

V1(c1,m1)

V2(c2,m2)

V3(c3,m3)

V1(c1,p1)

V2(c2,p2)

V3(c3,p3)

V5(m5,y5,r5)

CarForSale(c), Category(c,t), Year(c,y), Model(c,m), Price(c,p), ProductReview(m,y,r)y1992t=sportscar

Page 34: 1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

34

2. Checking Containment

User Queryq(m,p,r) CarForSale(c), Category(c,sportscar), Year(c,y), y1992,

Model(c,m), Price(c,p), ProductReview(m,y,r)

User Queryq(m,p,r) CarForSale(c), Category(c,sportscar), Year(c,y), y1992,

Model(c,m), Price(c,p), ProductReview(m,y,r)

Result Queryq’(m,p,r) V1(c)({Category(c):sportscar}, {Price(c), Model (c), Year(c)}, {Year(c)1992, Category(c)=sportscar}), V5(m,y,r)({m:Model(c), y:Year(c)}, {r}, {}).

Result Queryq’(m,p,r) V1(c)({Category(c):sportscar}, {Price(c), Model (c), Year(c)}, {Year(c)1992, Category(c)=sportscar}), V5(m,y,r)({m:Model(c), y:Year(c)}, {r}, {}).

?

Expanded Queryq’(m,p,r) CarForSale(c), UsedCar(c), Category(c,t), t=sportscar, Model(c,m), Year(c,y), Price(c,p), ProductReview(m,y,r), y1992

Expanded Queryq’(m,p,r) CarForSale(c), UsedCar(c), Category(c,t), t=sportscar, Model(c,m), Year(c,y), Price(c,p), ProductReview(m,y,r), y1992

Page 35: 1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

35

Finding an Executable Ordering

CarForSale(c), Category(c,t), Year(c,y), Model(c,m), Price(c,p), ProductReview(m,y,r)y1992t=sportscar

V1(c) V1(c,t) V1(c,y) V1(c,m) V1(c,p) V5(m,y,r)

BindAvail1 = {CarForSale(c,sportscar), Model(c,m), Year(c,y), Price(c,p), SellerContact(c,s)}

BindAvail1 = {CarForSale(c,sportscar), Model(c,m), Year(c,y), Price(c,p), SellerContact(c,s)}

BindAvail1 = {CarForSale(c,sportscar), Model(c,m), Year(c,y), Price(c,p), SellerContact(c,s), ProductReview(m,y,r)}

BindAvail1 = {CarForSale(c,sportscar), Model(c,m), Year(c,y), Price(c,p), SellerContact(c,s), ProductReview(m,y,r)}

BindAvail1 = {CarForSale(c,sportscar), Model(c,m), Year(c,y), Price(c,p), SellerContact(c,s), ProductReview(m,y,r), y1992}

BindAvail1 = {CarForSale(c,sportscar), Model(c,m), Year(c,y), Price(c,p), SellerContact(c,s), ProductReview(m,y,r), y1992}

Page 36: 1 Global-as-View and Local-as-View for Information Integration CS652 Spring 2004 Presenter: Yihong Ding.

36

Advantages and Disadvantages

• Gav: Tsimmis– Advantage

• Query reformulation: rule unfolding– Disadvantage

• Mediation description• Adding, removing, and modifying source description

– Better for static, centralized systems

• Lav: Information Maniford– Advantage: adding new sources

• Mediator (global predicates, source descriptions)• Query processing

– Disadvantages• query reformulation (Bucket algorithm)

– Better for dynamic, distributed systems