1 Lyublena Antova, Christoph Koch, and Dan Olteanu Saarland University Database Group Saarbr¨ucken, Germany 2007 Presented By: Rana Daud
1
Lyublena Antova, Christoph Koch, and Dan Olteanu Saarland University Database Group
Saarbr¨ucken, Germany 2007
Presented By: Rana Daud
2
• Introduction
• Application Scenarios
• I-SQL
• World-Set Algebra
• Algebraic Equivalences
• Conclusion & Future work
INTRODUCTION
3
SID CID GradeA GradeB
123456789 236363 NULL NULL
987654321 234114 NULL 83
001122337 236363 77 NULL
4
There is no agreement in the literature on the semantics of null values in relational databases: One of the reasons why it is difficult to agree on a semantics is that a null value can be
interpreted as an unknown, inapplicable, etc.
Since each occurrence of a null value can substituted by a non
null value, the relation containing nulls can be seen as a
shorthand for a set of relations, each obtained by different
substitutions. This will be our basic semantic assumption:
An incomplete relation represents
a set of (complete) relation.
5
Incomplete information arises naturally in numerous data management applications like data integration, data cleaning, and data exchange.
Recently, research community has shown a vivid interest in efficiently managing incomplete information viewed as a set of possible worlds.
A significant amount of research has attempted to find the right balance between the succinctness of world-set representations and the efficiency of query evaluation on top of them. However there is a lack of expressive query languages which are well tailored for sets of possible worlds.
6
A query language for incomplete information should at least the following demands
Generic
Expressive
Conservative
Efficient evaluation
SQL lacks explicit constructs for dealing with uncertainty, though there are queries on incomplete information that can be expressed as SQL queries on relational representations of incomplete databases with complicated nesting and aggregations. Extensions of RA or SQL with limited constructs (such as certain or top-k) are not expressive enough, as they do not allow for the convenient construction of new worlds.
7
To the date of publication this article, no proposal for a query language for incomplete information has been made that satisfies all of them
APPLICATION SCENARIOS
8
Example 1: Business decision support
9
Company_Emp
EID CID
e1 ACME
e2 ACME
e3 HAL
e4 HAL
e5 HAL
Emp_Skills
Skills EID
Web e1
Web e2
Java e3
Web e3
SQL e4
Java e5
10
SELECT * FROM Company Emp choice of CID;
EID CID
e1 ACME
e2 ACME
EID CID
e3 HAL
e4 HAL
e5 HAL
1U 2U
11
SELECT R1.CID, R1.EID FROM Company_Emp R1, (select * from U choice of EID) R2 WHERE R1.CID = R2.CID and R1.EID !=R2.EID;
12
CID EID
ACME e1
CID EID
ACME e2
CID EID
HAL e3
HAL e4
CID EID
HAL e3
HAL e5
CID EID
HAL e4
HAL e5
1.1V2.1V
1.2V 2.2V 3.2V
13
SELECT certain CID, Skill FROM V, Emp_Skill WHERE V.EID = Emp_Skill.EID Group worlds by (SELECT CID FROM V);
CID Skill
ACME Web
CID Skill
HAL Java
*.1W *.2W
Emp_Skills
Skills EID
Web e1
Web e2
Java e3
Web e3
SQL e4
Java e5
14
SELECT possible CID FROM W WHERE Skill=‘Web’;
CID
ACME
Example 2: Trip Planning Flights(Fid,Dep,Arr,Dtime,Atime) Hometowns(City) Flights
Dep Arr
FRA BCN
FRA ATL
PAR ATL
PAR BCN
PHL ATL 15
HomeTowns
City
FRA
PAR
PHL
...
create view HFlights as
select * from Flights where Dep in Hometowns;
select certain Arr from HFlights choice of Dep;
Assuming the exsistence of a division operator in SQL:
select Arr
from (select Arr, Dep from HFlights) as F1
divide by
(select Dep from HFlights) as F2
on F1.Dep = F2.Dep;
16
REMINDER- DIVISION:
17
D C B A
1 1 1 1
2 1 1 2
2 2 2 2
2 3 3 2
C B
1 1
2 2
D A
2 2 S= R S =
R =
Note:
Division can be simulated in SQL using a nested sub-query with two not-exists constructs: select Arr from HFlights F1 where not exists (select * from HFlights F2 where not exists (select * from HFlights F3 where F3.Dep = F2.Dep and F3.Arr = F1.Arr));
This shows that at least in certain cases, I-SQL allows to phrase decision support queries more concisely than plain SQL.
18
o We will treat I-SQL informally, mostly in examples. o The structure of an I-SQL query:
19
20
Main motivation is to find a natural extension of RA and SQL to the context of incomplete information. We next detail on the syntax and semantics of the Constructs separated to four groups.
Standard SQL constructs
Merging worlds
Splitting up worlds
Data manipulation
BACK TO FLIGHTS
21
Flights
Dep Arr
FRA BCN
FRA ATL
PAR ATL
PAR BCN
PHL ATL
Standard SQL constructs: a query is evaluated in each world independently and the result is added as a new relation to that world.
Example:
22
SELECT * FROM Flights WHERE Arr = ‘BCN’
Merging worlds: constructs that goes across
world borders to collect information that appears
in other worlds as well.
Possible and certain: compute the tuples that appear
in some, respectively all worlds. The result is then
added to each world of the input world-set.
Group-worlds-by: used in combination with ‘possible’
and ‘certain’ and allows specifying a condition on
which the worlds are grouped. The condition is given
in form of an SQL query; worlds that produce the
same result of that query are then put into the same
group. Then, ‘possible’ or ‘certain’ respectively, are
computes within each of the created groups.
23
When the query is a projection on a set of
attributes, we will write the set of attributes
directly as is done in the group-by in SQL
Arr
ATL
24
SELECT certain Arr FROM Flights
Dep Arr
FRA BCN
FRA ATL
AFlights
BFArr
ATL
Dep Arr
PAR ATL
PAR BCN
Dep Arr
PHL ATL
Arr
ATL
CF
BFlights CFlights
AF
Example:
Result:
Note:
Even though we used the closing
construct ‘certain’, the result is
again the set of three input worlds,
where each of them is extended with a new relation F. Only if the input is a single world, or if one is interested only in the result of the operation and not in the input relations, will a ‘possible’ or ‘certain’ construct produce a single world.
25
Splitting up worlds:
creation of new worlds using the operations:
choice-of: freezing the values of the given set of attributes and create separate world for every combination.
repair-by-key:
Generates the possible repairs that violates a uniqueness constraint for the values of a given set of attributes.
Generates possible configurations of items where each configuration contains only one item of a type.
naturally fits Data cleaning scenarios ( For example: De-duplication based on keys constraints).
26
27
Example:
Result: Dep Arr
FRA BCN
FRA ATL
Dep Arr
PAR ATL
PAR BCN
Dep Arr
PHL ATL
SELECT * FROM Flights choice of Dep;
Flights
Dep Arr
FRA BCN
FRA ATL
PAR ATL
PAR BCN
PHL ATL
AFlights BFlights CFlights
REPAIR-BY-KEY EXAMPLE:
28
Census(SSN, Name, POB, POW)
social security number
place of birth
place of work
POWPOBNameSSN ,,Functional Dependency:
29
all possible relations that are consistent with regard to the functional dependency and are
contained in the relation Census.
SELECT * FROM Census repair by key SSN
Note:
This query can produce exponentially many
worlds, and is thus not expressible in SQL
(or RA). In fact, NP-hard problems can be
expressed as queries with repair-by-key.
Data Manipulation:
insert
update
delete
The query is executed in each world of the world-set independently. In case that inserting or updating the tuple violates a constraint in some worlds, the update is discarded in all worlds.
Example:
Result:
30 Dep Arr
FRA BCN
Dep Arr
PAR BCN
Dep Arr
DELETE FROM Flights WHERE Arr = ‘ATL’
AFlights BFlightsCFlights
Order of evaluation:
(1) Computing the product of the relations produced by the sub-queries in
the from-clause.
(2) Applying the conditions of the where-clause on top.
(3) If any of the new operators ‘choice-of’, ‘repair-by-key’ and ‘group-
worlds-by’ are specified, they are applied in the order given by
structure of the query in I-SQL :
(3.1) choice-of to create a world for each combination of values for the specified attributes.
(3.2) repair-by-key in each of the created worlds.
(3.3) group-worlds-by operation is applied on the world-set created after the repair-by-key.
(4) Projecting on the attributes given in the select list, and if ‘possible’ or
‘certain’ are present we union, respectively intersect, the tuples in
that projection. 31
WORLD-SET ALGEBRA
Now we will focus on World-set Algebra in the formal treatment.
It is for the fragment of I-SQL without SQL
grouping and aggregation constructs.
World-set Algebra is an extension of RA with new constructs.
It is generic: the semantics of a query is independent of the world-set representation.
This is fundamental property. 32
Syntax and Semantics:
Selection
Projection
Cartesian Product
Union
Difference ̶̶
Renaming
Intersect
Division
33
Base operators
r s R\S(r) \ R\S((R\S(r) s) \ r)
New constructs:
poss
cert
choice-of
possible group-worlds-by
certain group-worlds-by
U
V
Up
V
Uc
34
35
kRRR ,...,, 21World-set A contain worlds over schema
Apply a
query q
1,1 ,,..., kk RRR
Relation that represents the answer to q in each world
SEMANTICS OF THE OPERATORS:
World-set contain worlds over schema
Unary operator Evaluate q in each world
is evaluated on and the answer replaces
36
,,f
f1kR
1kR 1kR
оIf q is the identity on a relation (i.e., of the form ), we add a copy of that relation to each world.
iR
Semantics of world-set algebra defines as a function mapping between world-sets
Binary operators ( ̶ ) Evaluate the operands two world-sets and
Perform the binary operation in those combinations of one world from and one world from that agree on the relation .
37
,,,, A A
A A
Forbid operations between relations that occur in different worlds in the original world-set
kRR ,...,1
38
choice-of creates a new world for each choice of the values in the
projection on in each world.
The relation is then replaced in each of the new worlds by the subset of consisting of those tuples that agree on the values of U. Thus there are no two new worlds created from the same world with the same values of U.
When applied to the empty relation, choice-of produces an empty relation.
U
U 1kR
1kR
1kR Each newly created world also contains the relations of the world from which it was derived. This assure compositionality.
kRR ,...,1
39
Auxiliary definitions:
condition
group-worlds-by: &
The group-worlds-by operators and group worlds in
a world-set such that all worlds in a group agree on .
We then replace by in each world.
In the case of , in each world B is replaced by the
union of the relations from the group of worlds associated
with B.
Analogously, in the case , the new relation in a world
B becomes the intersection of the relations from the
group of worlds associated with B.
40
V
Up V
Uc
V
Up V
Uc
)( 1kU R
)( 1kV RV
Up1kR
1kR
1kR
V
Uc
1kR
1kR
41
poss:
is replaced by the union of all its instances
across all worlds
cert:
is replaced by the intersection of all its
instances across all worlds.
42
1kR
1kR
43
))))__
)_((((((
.2.1.2.1
,.1,.1
*
''
SkillsEmpEmpCompany
EmpCompanycposs
EIDEIDCIDCID
EIDCIDEIDCIDCIDW ebSkillCID
The first query asking for possible acquisition
targets can be expressed in world set algebra as:
GENERICITY
Genericity is a fundamental property of query
languages. It guarantees that query results are
independent from the representation of the data
and interpretation of domain values.
RA and SQL are generic.
World-set algebra is generic: its semantics does
not depend on the world-set representation.
44
FROM WORLD-SET ALGEBRA TO RA
Any world-set algebra query can be efficiently translated to an equivalent relational algebra query over a complete representation of the input world-set.
Propose the inlined representation, where the tuples of a relation over all worlds are represented in one table that has special attributes to denote the identifier of the world each tuple belongs to.
45
Main contributes of this section:
World-set algebra is conservative over RA. This means that any world-set algebra query that maps from a complete database to a complete database (a “complete-to-complete” query) is equivalent to a RA query
An efficient algorithm for effecting this translation. It follows that complete-to-complete world-set algebra queries have the same low data complexity as RA.
46
ALGEBRAIC EQUIVALENCES
The goal of equivalence is optimization.
They defined two classes of equivalences:
Commute rules: covers pairs of operators that
commute.
Reduce rules: covers simplifications of
operator compositions.
47
Commute Rules:
Pushing down of the new operators poss and cert even across projection and selection where this is possible. This usually bears even greater potential for optimization.
Some pairs of operators do not commute, for example: Selection & Choice-of
Product & poss
48
49
Commute rules
Reduce rules-examples:
Equivalence (11): the operator poss eliminates choice-of operator,
because choice-of distributes tuples into a set of disjoint
worlds, which latter flattened by the operator poss.
Equivalence(15): poss can undo world grouping.
Equivalence(20)+(21): in the presence of choice-of operators, the
group-worlds-by operators are reduced to simple projections in
case the choice attributes occur as both grouping and
projecting attributes.
Equivalence(22)+(23): redundant poss or cert operations.
50
51
Reduce rules
EXAMPLE:
52
)))))((((( ,
*
1 HotelsHFlightscertq CityDepDepCityArrCity
Consider a possibly incomplete version of our HFlights database from example 2, where additionally we have information on Hotels.
HFlights
Dep Arr
… …
Hotels
Name City Price
… … …
53
)))))((((( ,
* HotelsHFlightscert CityDepDepCityArrCity
)))))((((( * HotelsHFlightscert DepCityArrCity
1q
)))))((((( * HotelsHFlightscert DepCityArrCity
)))((( HotelsHFlightscertCityArr
DepCity
1q
20
8
54
1q1q
CONCLUSION & FUTURE WORK
Two application scenarios to motivate I-SQL.
I-SQL, an analog to SQL for the case of incomplete information.
World-set algebra Genericity
Conservativity over RA
Expressive
Set of equivalences in world-set algebra, which produce more efficient queries. Efficient evaluation
55
Future work:
generalization to bag semantics
implementation of I-SQL on top of a relational engine.
To implement I-SQL on top of an existing representation system for finite world-sets, like data bases with lineage and uncertainty.
56
57
Thank you &
Good luck