Lyublena Antova, Christoph Koch, and Dan Olteanu Saarland University Database Group ...kanza/dbseminar/2012/CompleteTo... · 2013. 2. 27. · Lyublena Antova, Christoph Koch, and

Post on 12-Mar-2021

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

1

Lyublena Antova, Christoph Koch, and Dan Olteanu Saarland University Database Group

Saarbr¨ucken, Germany 2007

Presented By: Rana Daud

2

• Introduction

• Application Scenarios

• I-SQL

• World-Set Algebra

• Algebraic Equivalences

• Conclusion & Future work

INTRODUCTION

3

SID CID GradeA GradeB

123456789 236363 NULL NULL

987654321 234114 NULL 83

001122337 236363 77 NULL

4

There is no agreement in the literature on the semantics of null values in relational databases: One of the reasons why it is difficult to agree on a semantics is that a null value can be

interpreted as an unknown, inapplicable, etc.

Since each occurrence of a null value can substituted by a non

null value, the relation containing nulls can be seen as a

shorthand for a set of relations, each obtained by different

substitutions. This will be our basic semantic assumption:

An incomplete relation represents

a set of (complete) relation.

5

Incomplete information arises naturally in numerous data management applications like data integration, data cleaning, and data exchange.

Recently, research community has shown a vivid interest in efficiently managing incomplete information viewed as a set of possible worlds.

A significant amount of research has attempted to find the right balance between the succinctness of world-set representations and the efficiency of query evaluation on top of them. However there is a lack of expressive query languages which are well tailored for sets of possible worlds.

6

A query language for incomplete information should at least the following demands

Generic

Expressive

Conservative

Efficient evaluation

SQL lacks explicit constructs for dealing with uncertainty, though there are queries on incomplete information that can be expressed as SQL queries on relational representations of incomplete databases with complicated nesting and aggregations. Extensions of RA or SQL with limited constructs (such as certain or top-k) are not expressive enough, as they do not allow for the convenient construction of new worlds.

7

To the date of publication this article, no proposal for a query language for incomplete information has been made that satisfies all of them

APPLICATION SCENARIOS

8

Example 1: Business decision support

9

Company_Emp

EID CID

e1 ACME

e2 ACME

e3 HAL

e4 HAL

e5 HAL

Emp_Skills

Skills EID

Web e1

Web e2

Java e3

Web e3

SQL e4

Java e5

10

SELECT * FROM Company Emp choice of CID;

EID CID

e1 ACME

e2 ACME

EID CID

e3 HAL

e4 HAL

e5 HAL

1U 2U

11

SELECT R1.CID, R1.EID FROM Company_Emp R1, (select * from U choice of EID) R2 WHERE R1.CID = R2.CID and R1.EID !=R2.EID;

12

CID EID

ACME e1

CID EID

ACME e2

CID EID

HAL e3

HAL e4

CID EID

HAL e3

HAL e5

CID EID

HAL e4

HAL e5

1.1V2.1V

1.2V 2.2V 3.2V

13

SELECT certain CID, Skill FROM V, Emp_Skill WHERE V.EID = Emp_Skill.EID Group worlds by (SELECT CID FROM V);

CID Skill

ACME Web

CID Skill

HAL Java

*.1W *.2W

Emp_Skills

Skills EID

Web e1

Web e2

Java e3

Web e3

SQL e4

Java e5

14

SELECT possible CID FROM W WHERE Skill=‘Web’;

CID

ACME

Example 2: Trip Planning Flights(Fid,Dep,Arr,Dtime,Atime) Hometowns(City) Flights

Dep Arr

FRA BCN

FRA ATL

PAR ATL

PAR BCN

PHL ATL 15

HomeTowns

City

FRA

PAR

PHL

...

create view HFlights as

select * from Flights where Dep in Hometowns;

select certain Arr from HFlights choice of Dep;

Assuming the exsistence of a division operator in SQL:

select Arr

from (select Arr, Dep from HFlights) as F1

divide by

(select Dep from HFlights) as F2

on F1.Dep = F2.Dep;

16

REMINDER- DIVISION:

17

D C B A

1 1 1 1

2 1 1 2

2 2 2 2

2 3 3 2

C B

1 1

2 2

D A

2 2 S= R S =

R =

Note:

Division can be simulated in SQL using a nested sub-query with two not-exists constructs: select Arr from HFlights F1 where not exists (select * from HFlights F2 where not exists (select * from HFlights F3 where F3.Dep = F2.Dep and F3.Arr = F1.Arr));

This shows that at least in certain cases, I-SQL allows to phrase decision support queries more concisely than plain SQL.

18

o We will treat I-SQL informally, mostly in examples. o The structure of an I-SQL query:

19

20

Main motivation is to find a natural extension of RA and SQL to the context of incomplete information. We next detail on the syntax and semantics of the Constructs separated to four groups.

Standard SQL constructs

Merging worlds

Splitting up worlds

Data manipulation

BACK TO FLIGHTS

21

Flights

Dep Arr

FRA BCN

FRA ATL

PAR ATL

PAR BCN

PHL ATL

Standard SQL constructs: a query is evaluated in each world independently and the result is added as a new relation to that world.

Example:

22

SELECT * FROM Flights WHERE Arr = ‘BCN’

Merging worlds: constructs that goes across

world borders to collect information that appears

in other worlds as well.

Possible and certain: compute the tuples that appear

in some, respectively all worlds. The result is then

added to each world of the input world-set.

Group-worlds-by: used in combination with ‘possible’

and ‘certain’ and allows specifying a condition on

which the worlds are grouped. The condition is given

in form of an SQL query; worlds that produce the

same result of that query are then put into the same

group. Then, ‘possible’ or ‘certain’ respectively, are

computes within each of the created groups.

23

When the query is a projection on a set of

attributes, we will write the set of attributes

directly as is done in the group-by in SQL

Arr

ATL

24

SELECT certain Arr FROM Flights

Dep Arr

FRA BCN

FRA ATL

AFlights

BFArr

ATL

Dep Arr

PAR ATL

PAR BCN

Dep Arr

PHL ATL

Arr

ATL

CF

BFlights CFlights

AF

Example:

Result:

Note:

Even though we used the closing

construct ‘certain’, the result is

again the set of three input worlds,

where each of them is extended with a new relation F. Only if the input is a single world, or if one is interested only in the result of the operation and not in the input relations, will a ‘possible’ or ‘certain’ construct produce a single world.

25

Splitting up worlds:

creation of new worlds using the operations:

choice-of: freezing the values of the given set of attributes and create separate world for every combination.

repair-by-key:

Generates the possible repairs that violates a uniqueness constraint for the values of a given set of attributes.

Generates possible configurations of items where each configuration contains only one item of a type.

naturally fits Data cleaning scenarios ( For example: De-duplication based on keys constraints).

26

27

Example:

Result: Dep Arr

FRA BCN

FRA ATL

Dep Arr

PAR ATL

PAR BCN

Dep Arr

PHL ATL

SELECT * FROM Flights choice of Dep;

Flights

Dep Arr

FRA BCN

FRA ATL

PAR ATL

PAR BCN

PHL ATL

AFlights BFlights CFlights

REPAIR-BY-KEY EXAMPLE:

28

Census(SSN, Name, POB, POW)

social security number

place of birth

place of work

POWPOBNameSSN ,,Functional Dependency:

29

all possible relations that are consistent with regard to the functional dependency and are

contained in the relation Census.

SELECT * FROM Census repair by key SSN

Note:

This query can produce exponentially many

worlds, and is thus not expressible in SQL

(or RA). In fact, NP-hard problems can be

expressed as queries with repair-by-key.

Data Manipulation:

insert

update

delete

The query is executed in each world of the world-set independently. In case that inserting or updating the tuple violates a constraint in some worlds, the update is discarded in all worlds.

Example:

Result:

30 Dep Arr

FRA BCN

Dep Arr

PAR BCN

Dep Arr

DELETE FROM Flights WHERE Arr = ‘ATL’

AFlights BFlightsCFlights

Order of evaluation:

(1) Computing the product of the relations produced by the sub-queries in

the from-clause.

(2) Applying the conditions of the where-clause on top.

(3) If any of the new operators ‘choice-of’, ‘repair-by-key’ and ‘group-

worlds-by’ are specified, they are applied in the order given by

structure of the query in I-SQL :

(3.1) choice-of to create a world for each combination of values for the specified attributes.

(3.2) repair-by-key in each of the created worlds.

(3.3) group-worlds-by operation is applied on the world-set created after the repair-by-key.

(4) Projecting on the attributes given in the select list, and if ‘possible’ or

‘certain’ are present we union, respectively intersect, the tuples in

that projection. 31

WORLD-SET ALGEBRA

Now we will focus on World-set Algebra in the formal treatment.

It is for the fragment of I-SQL without SQL

grouping and aggregation constructs.

World-set Algebra is an extension of RA with new constructs.

It is generic: the semantics of a query is independent of the world-set representation.

This is fundamental property. 32

Syntax and Semantics:

Selection

Projection

Cartesian Product

Union

Difference ̶̶

Renaming

Intersect

Division

33

Base operators

r s R\S(r) \ R\S((R\S(r) s) \ r)

New constructs:

poss

cert

choice-of

possible group-worlds-by

certain group-worlds-by

U

V

Up

V

Uc

34

35

kRRR ,...,, 21World-set A contain worlds over schema

Apply a

query q

1,1 ,,..., kk RRR

Relation that represents the answer to q in each world

SEMANTICS OF THE OPERATORS:

World-set contain worlds over schema

Unary operator Evaluate q in each world

is evaluated on and the answer replaces

36

,,f

f1kR

1kR 1kR

оIf q is the identity on a relation (i.e., of the form ), we add a copy of that relation to each world.

iR

Semantics of world-set algebra defines as a function mapping between world-sets

Binary operators ( ̶ ) Evaluate the operands two world-sets and

Perform the binary operation in those combinations of one world from and one world from that agree on the relation .

37

,,,, A A

A A

Forbid operations between relations that occur in different worlds in the original world-set

kRR ,...,1

38

choice-of creates a new world for each choice of the values in the

projection on in each world.

The relation is then replaced in each of the new worlds by the subset of consisting of those tuples that agree on the values of U. Thus there are no two new worlds created from the same world with the same values of U.

When applied to the empty relation, choice-of produces an empty relation.

U

U 1kR

1kR

1kR Each newly created world also contains the relations of the world from which it was derived. This assure compositionality.

kRR ,...,1

39

Auxiliary definitions:

condition

group-worlds-by: &

The group-worlds-by operators and group worlds in

a world-set such that all worlds in a group agree on .

We then replace by in each world.

In the case of , in each world B is replaced by the

union of the relations from the group of worlds associated

with B.

Analogously, in the case , the new relation in a world

B becomes the intersection of the relations from the

group of worlds associated with B.

40

V

Up V

Uc

V

Up V

Uc

)( 1kU R

)( 1kV RV

Up1kR

1kR

1kR

V

Uc

1kR

1kR

41

poss:

is replaced by the union of all its instances

across all worlds

cert:

is replaced by the intersection of all its

instances across all worlds.

42

1kR

1kR

43

))))__

)_((((((

.2.1.2.1

,.1,.1

*

''

SkillsEmpEmpCompany

EmpCompanycposs

EIDEIDCIDCID

EIDCIDEIDCIDCIDW ebSkillCID

The first query asking for possible acquisition

targets can be expressed in world set algebra as:

GENERICITY

Genericity is a fundamental property of query

languages. It guarantees that query results are

independent from the representation of the data

and interpretation of domain values.

RA and SQL are generic.

World-set algebra is generic: its semantics does

not depend on the world-set representation.

44

FROM WORLD-SET ALGEBRA TO RA

Any world-set algebra query can be efficiently translated to an equivalent relational algebra query over a complete representation of the input world-set.

Propose the inlined representation, where the tuples of a relation over all worlds are represented in one table that has special attributes to denote the identifier of the world each tuple belongs to.

45

Main contributes of this section:

World-set algebra is conservative over RA. This means that any world-set algebra query that maps from a complete database to a complete database (a “complete-to-complete” query) is equivalent to a RA query

An efficient algorithm for effecting this translation. It follows that complete-to-complete world-set algebra queries have the same low data complexity as RA.

46

ALGEBRAIC EQUIVALENCES

The goal of equivalence is optimization.

They defined two classes of equivalences:

Commute rules: covers pairs of operators that

commute.

Reduce rules: covers simplifications of

operator compositions.

47

Commute Rules:

Pushing down of the new operators poss and cert even across projection and selection where this is possible. This usually bears even greater potential for optimization.

Some pairs of operators do not commute, for example: Selection & Choice-of

Product & poss

48

49

Commute rules

Reduce rules-examples:

Equivalence (11): the operator poss eliminates choice-of operator,

because choice-of distributes tuples into a set of disjoint

worlds, which latter flattened by the operator poss.

Equivalence(15): poss can undo world grouping.

Equivalence(20)+(21): in the presence of choice-of operators, the

group-worlds-by operators are reduced to simple projections in

case the choice attributes occur as both grouping and

projecting attributes.

Equivalence(22)+(23): redundant poss or cert operations.

50

51

Reduce rules

EXAMPLE:

52

)))))((((( ,

*

1 HotelsHFlightscertq CityDepDepCityArrCity

Consider a possibly incomplete version of our HFlights database from example 2, where additionally we have information on Hotels.

HFlights

Dep Arr

… …

Hotels

Name City Price

… … …

53

)))))((((( ,

* HotelsHFlightscert CityDepDepCityArrCity

)))))((((( * HotelsHFlightscert DepCityArrCity

1q

)))))((((( * HotelsHFlightscert DepCityArrCity

)))((( HotelsHFlightscertCityArr

DepCity

1q

20

8

54

1q1q

CONCLUSION & FUTURE WORK

Two application scenarios to motivate I-SQL.

I-SQL, an analog to SQL for the case of incomplete information.

World-set algebra Genericity

Conservativity over RA

Expressive

Set of equivalences in world-set algebra, which produce more efficient queries. Efficient evaluation

55

Future work:

generalization to bag semantics

implementation of I-SQL on top of a relational engine.

To implement I-SQL on top of an existing representation system for finite world-sets, like data bases with lineage and uncertainty.

56

57

Thank you &

Good luck

top related