Dec 31, 2015
ReviewTheoretical Query Languages
1. SELECT ( σ )2. PROJECT ( π )3. UNION ( )4. SET DIFFERENCE ( – )5. CARTESIAN PRODUCT ( )6. RENAME ( ρ )
Relational Algebra
• RA: gives semantics to practical query languages• Above set: minimal relational algebra
will look at some redundant (but useful!) operators today
Review
Find the names of customers who have both accounts and loans
T1 ρT1 (cname2, lno)
(borrower)
T2 depositor T1
T3 σcname = cname2 (T2)
Result π cname (T3)
Above sequence of operators (ρ, , σ) very common.
Express the following query in the RA:
Motivates additional (redundant) RA operators.
Relational AlgebraRedundant Operators
• 5. Update ( ) (we’ve already been using)
2. Division ( )
3. Generalized Projection (π)
1. Natural Join ( )
4. Outer Joins ( )
• Redundant: Above can be expressed in terms of minimal RA e.g. depositor borrower =
π …(σ…(depositor ρ…(borrower)))
• Added as convenience
Natural Join
Idea: combines ρ, , σ
A B C D
1
2
2
3
α
α
α
β
+
-
-
+
10
10
20
10
E B D
‘a’
‘a’
‘b’
‘c’
α
α
β
β
10
20
10
10r s
A B C D E
1
2
2
3
3
α
α
α
β
β
+
-
-
+
+
10
10
20
10
10
‘a’
‘a’
‘a’
‘b’
‘c’
=
Relation1 Relation2Notation:
πcname,acct_no,lno (σcname=cname2 (depositor ρt(cname2,lno) (borrower)))
≡
depositor borrower
Division
A B
α
α
α
β
γ
γ
γ
γ
δ
δ
1
23
1
1
3
4
6
1
2
B
1
2
r
s
A
α
δ
=
Query: Find values for A in r which have corresponding B values for all B values in s
Relation1 Relation2Notation:
Idea: expresses “for all” queries
Division
A B
α
α
α
β
γ
γ
γ
γ
δ
δ
1
23
1
1
3
4
6
1
2
B
1
2
r
s
A
α
δ
=
17 3 = 5
The largest value of i such
that: i 3 ≤ 17
t
Relational Division
The largest value of t such that:
( t s r )
Another way to look at it: and
A B C D E
α
α
α
β
β
γ
γ
γ
a
a
a
a
a
a
a
a
α
γ
γ
γ
γ
γ
γ
β
a
a
b
a
b
a
b
b
1
1
1
1
3
1
1
1
D E
a
b
1
1
r
s
A B C
α
γ
a
a
γγ
=
t
?
Division
A More Complex Example
e1,…,en (Relation)
e1,…,en can include arithmetic expressions – not just attributes
cname limit balance
Jones
Turner
5000
3000
2000
2500
credit =
π cname, limit - balance (credit) = cname limit-balance
Jones
Turner
3000
500
Generalized Projection
Notation:
Example
Then…
bname lno amt
Downtown
Redwood
Perry
L-170
L-230
L-260
3000
4000
1700
loan =
cname lno
Jones
Smith
Hayes
L-170
L-230
L-155
borrower =
=
bname lno amt cname
Downtown
Redwood
L-170
L-230
3000
4000
Jones
Smith
Join result loses… any record of Perry any record of Hayes
Outer Joins
Motivation:
loan borrower =
bname lno amt
Downtown
Redwood
Perry
L-170
L-230
L-260
3000
4000
1700
loan =
cname lno
Jones
Smith
Hayes
L-170
L-230
L-155
borrower =
bname lno amt cname
Downtown
Redwood
Perry
L-170
L-230
L-260
3000
4000
1700
Jones
Smith
┴
• preserves all tuples in left relation
1. Left Outer Join ( )
┴ = NULL
Outer Joins
loan borrower =
bname lno amt cname
Downtown
Redwood
┴
L-170
L-230
L-155
3000
4000
┴
Jones
Smith
Hayes
bname lno amt
Downtown
Redwood
Perry
L-170
L-230
L-260
3000
4000
1700
loan =cname lno
Jones
Smith
Hayes
L-170
L-230
L-155
borrower =
• preserves all tuples in right relation2. Right Outer Join ( )
┴ = NULL
Outer Joins
loan borrower =
bname lno amt
Downtown
Redwood
Perry
L-170
L-230
L-260
3000
4000
1700
loan =cname lno
Jones
Smith
Hayes
L-170
L-230
L-155
borrower =
• preserves all tuples in both relations3. Full Outer Join ( )
┴ = NULL
Outer Joins
bname lno amt cname
Downtown
Redwood
Perry
┴
L-170
L-230
L-260
L-155
3000
4000
1700
┴
Jones
Smith
┴
Hayes
loan borrower =
1. Deletion: r r – s e.g., account account – σbname=Perry (account)(deletes all Perry accounts)
2. Insertion: r r se.g., branch branch {(Waltham, Boston, 7M)}(inserts new branch with bname = Waltham, bcity = Boston, assets = 7M)
3. Update: r πe1,…,en (r)
e.g., depositor depositor (ρtemp (cname,acct_no) (borrower))(adds all borrowers to depositors, treating lno’s as acct_no’s)
e.g., account πbname,acct_no,bal*1.05 (account)(adds 5% interest to account balances)
Update
Identifier QueryNotation:
Common Uses:
Another Theoretical Query Language
Relational CalculusTwo flavors:
• Tuple relational calculus (TRC)• Domain relational calculus (DRC)
Logic-based query language({x | … }, , , , , , , …)
More declarative than RARA: πlno (σamt > 1000 (loan))
Procedural
1. Select loan tuples with amt > 10002. Project the result of 1 on lno
TRC: {t | s loan (t [lno] = s [lno] s [amt] > 1000) }
Non-procedural
• No order of evaluation implied• Basis for SQL
Bank DatabaseAccount
bname acct_no balance
DowntownMianusPerryR.H.
BrightonRedwoodBrighton
A-101A-215A-102A-305A-201A-222A-217
500700400350900700750Depositor
cname acct_noJohnsonSmithHayesTurner
JohnsonJones
Lindsay
A-101A-215A-102A-305A-201A-217A-222
Customer
cname cstreet ccityJonesSmithHayesCurry
LindsayTurner
WilliamsAdamsJohnsonGlennBrooksGreen
MainNorthMainNorthPark
PutnamNassauSpringAlma
Sand HillSenatorWalnut
HarrisonRye
HarrisonRye
PittsfieldStanfordPrincetonPittsfieldPalo AltoWoodsideBrooklynStanford
Branch
bname bcity assetsDowntownRedwood
PerryMianus
R.H.Pownel
N. TownBrighton
BrooklynPalo AltoHorseneckHorseneckHorseneckBennington
RyeBrooklyn
9M2.1M1.7M0.4M8M
0.3M3.7M7.1M
Borrower
cname lnoJonesSmithHayes
JacksonCurrySmith
WilliamsAdams
L-17L-23L-15L-14L-93L-11L-17L-16
Loan
bname lno amtDowntownRedwood
PerryDowntown
MianusR.H.Perry
L-17L-23L-15L-14L-93L-11L-16
1000200015001500500900
1300
Tuple Relational CalculusSome Queries
bname lno amt
Redwood
Perry
Downtown
Perry
L-23
L-15
L-14
L-16
2000
1500
1500
1300
1. Find loans for amounts > $1200{t | t loan t[amt] > 1200}
Basic Form: {x | P(x)}• set comprehension: “the set of all x such that P(x) is true”• x: tuple variable• logic contained in predicate (P)1. t loan2. t [amt] > 12003. (1) (2) (equivalent to σ (2) (loan))
Result
Given {x | P(x)}, what can P(x) be?
1. Simple predicate (, =, ≠, <, >, ≤, ≥)
e.g., t loane.g., t [amt] > 1200
2. Compound predicate (, , , )
e.g., (t loan) t [amt] > 1200e.g., (t [bname] = “Downtown”)e.g., (t [bname] = “Downtown”) t [amt] > 1200
( OR, AND, NOT)
Tuple Relational CalculusPredicates
3. Quantified Predicates (, )
(a) Existential Quantification ()
Given {x | P(x)}, what can P(x) be?
Tuple Relational CalculusPredicates
• true if there exists some tuple in r (t) such that Q(t) is true• e.g., s loan (s [lno] = “L-17”)
t r (Q (t))
(b) Universal Quantification ()
• true if for all tuples in r (t), Q(t) is true• e.g., s loan (s [amt] > 100)
t r (Q (t))
bname
lno amt
Mianus
L-93 500
2. {t | t loan s loan (s [amt] > t [amt])}
A. Returns everything in loan except for (Redwood, L-23, 2000)
3. {t | t loan s loan (s [amt] > t [amt])}
A. Returns
Q. Express a TRC query to find the largest loan
A. {t | t loan s loan (s [amt] < t [amt])} OR
{t | t loan ( s loan (s [amt] > t [amt]))}
Tuple Relational CalculusMore Queries
lno
L-23
L-15
L-14
L-16
σ : Find loans for amts > 1200{t | t loan (t [amt] > 1200)}
t | t loan indicates that t has same structure as tuples in loan
π: Find loan numbers for all loans for amts > 1200
{t | s loan (t [lno] = s [lno] s [amt] > 1200)}
No predicate of form: t relation
t consists of attributes used in set comprehension with t (i.e., lno)
Result =
Tuple Relational CalculusProjection Queries
bname bcity
Downtown
R.H.
N.Town
Brighton
Brooklyn
Horseneck
Rye
Brooklyn
Q. Find names and cities of branches with assets > $3M.
A. {t | s branch (t [branch] = s [branch] t [bcity] = s [bcity] s [assets] > 3M)}
Result =
Tuple Relational CalculusProjection Queries
Find the names of customers w/ loans at the Perry branch.
Answer has form {t | P(t)}.
Strategy for determining P(t):
1. What tables are involved?
2. What are the conditions?
borrower (s), loan (u)
(a) Projection: t [cname] = s [cname](b) Join: s [lno] = u [lno](c) Selection: u [bname] = “Perry”
Tuple Relational CalculusJoin Queries
A. {t | s borrower (P(t,s))} such that:
P(t,s) t [cname] = s [cname] u loan (Q(t,s,u))Q(t,s,u) s [lno] = u [lno] u [bname] = “Perry”
OR
{t | s borrower (t [cname] = s [cname] u loan (s [lno] = u [lno] u [bname] = “Perry”))}
Find the names of customers w/ loans at the Perry branch.
Tuple Relational CalculusJoin Queries
unfolded version (either is ok)
Q. Find loan numbers of loans held at branches in Brooklyn.1. Tables involved
loan (s), branch (u)
2. Conditions
(a) Projectiont [lno] = s [lno]
(b) Joins [bname] = u [bname]
(c) Selectionu [bcity] = “Brooklyn”
A. {t | s loan (P(t,s))} such that:
P(t,s) t [lno] = s [lno] u branch (Q(t,s,u))
Q(t,s,u) s [bname] = u [bname] u [bcity] = “Brooklyn”
Tuple Relational CalculusJoin Queries
Q. Find the names and cities of customers having a loan from the Perry branch
1. Tables involvedborrower (s), customer (u), loan (v)
2. Conditions(a) Projection
t [cname] = s [cname]t [ccity] = u [ccity]
(b) Joins [cname] = u [cname]s [lno] = v [lno]
(c) Selectionv [bname] = “Perry”
A. {t | s borrower (P(t,s))}
P(t,s) t [cname] = s [cname] u customer (Q(t,s,u))
Q(t,s,u) t [ccity] = u [ccity] s [cname] = u [cname] v loan (R(t,s,u,v))
R(t,s,u,v) s [lno] = v [lno] v [bname] = “Perry”
Tuple Relational CalculusJoin Queries
bname lno amt
Redwood
Downtown
Mianus
R.H
Perry
Perry
L-23
L-14
L-93
L-11
L-16
L-15
2000
1500
500
900
1300
1500
P(t)
Q(t)
Resembles if … then
Example
Result =
p q : true if p being true always means q is also truep q ≡ p q
{t | t loan P(t) Q(t) }
P(t) ≡ t [bname] = “Perry”Q(t) ≡ t [amt] > 1000
Tuple Relational CalculusImplication ()
Often is used with to express “for all” queries
e.g., Find names of customers who have an account at all branches
located in Brooklyn
Connection of all to implies
Rewording of example:
Find names of customers for whom the following property hold:For every branch, if the branch in located in Brooklyn, this implies that the customer has an account at that branch.
Tuple Relational CalculusImplication ()
Tuple Relational CalculusImplication (cont.)
Q. Find names of customers for whom the following property holds: For every branch, if the branch is located in Brooklyn, this implies that the customer has an account at that branch.
A. {t | s branch (s [bcity] = “Brooklyn” P(t,s))}
What is P(t,s)?
Tuple Relational CalculusImplication (cont.)
Q. Find names of customers for whom the following property holds: For every branch, if the branch is located in Brooklyn, this implies that the customer has an account at that branch.
A. {t | s branch (s [bcity] = “Brooklyn” P(t,s))}
1. Tables involved
branch (s), depositor (u), account (v)
2. Conditions
(a) Implications [bcity] = “Brooklyn”
(b) Projectiont [cname] = u [cname]
(c) Joins [bname] = v [bname]u [acct_no] = v [acct_no]
(d) Selection -
Tuple Relational CalculusImplication (cont.)
Q. Find names of customers for whom the following property holds: For every branch, if the branch is located in Brooklyn, this implies that the customer has an account at that branch.
A. {t | s branch (s [bcity] = “Brooklyn” P(t,s))}P(t,s) ≡ u depositor (Q(t,s,u))Q(t,s,u) ≡ t [cname] = u [cname] v account (R(t,s,u,v))R(t,s,u,v) ≡ s [bname] = v [bname] u [acct_no] = v [acct_no]
Domain Relational CalculusAtoms & Formulas
LetDi be a domain variablec be a domain constant be a comparison operator
Atoms• r(D1, D2, …, Dn)• Di Dj
• Di c
Let FF, F1F1 and F2F2 be formulasFormulas
• ( FF )• not FF• F1F1 and F2F2• F1F1 or F2F2
Let D be free* in FF(D) • (exists D) FF(D)• (forall D) FF(D)
* a variable is free in a formula if it is not quantified by exists or forall
Domain Relational CalculusValid Expression
{ D1, …, Dn | FF (D1, …, Dn) }is a valid DRC expression if it has only the variables appearing to the left of the vertical bar | free in FF.Any other variable appearing in FF must be bound.
free vs. bound variables• free (global): variable is not explicitly quantified• bound (free): variable is declared explicitly
through quantification and its scope is the quantified formula
Domain Relational CalculusRelational Completeness
condition (r):
{ R1, …, Rn | r(R1, …, Rn) and condition}
ai,…,aj(r):
{ Ri, …, Rj | r(R1, …, Ri, …, Rj, …, Rn)}
r s:
{ D1, …, Dn | r(D1, …, Dn) or s(D1, …, Dn) }
r - s:
{ D1, …, Dn | r(D1, …, Dn) and not s(D1, …, Dn) }
q × r :
{ Q1, …, Qm, R1, …, Rn | q(Q1, …, Qm) and r(R1, …, Rn) }
Tuple Relational CalculusSyntax Summary
{ T1, …, Tn | FF (T1, …, Tn) }
• F F describes the properties of the data to be retrieved.
• The output schema of FF is given by the tuple variables
T1, …, Tn that act as global variables in FF.
Tuple Relational CalculusAtoms & Formulas
LetT and Ti be tuple variablesaj be an attributec be a domain constant be a comparison operator
Atoms• r(T)• Ti.am Tj.an
• T.ai c
Let FF, F1F1 and F2F2 be formulasFormulas
• ( FF )• not FF• F1F1 and F2F2• F1F1 or F2F2
Let T be free* in FF(T) • (exists T) FF(T)• (forall T) FF(T)
* a variable is free in a formula if it is not quantified by exists or forall
Tuple Relational CalculusValid Expression
{ T1, …, Tn | FF (T1, …, Tn) }is a valid TRC expression if it has only the variables appearing to the left of the vertical bar | free in FF.Any other variable appearing in FF must be bound.
free vs. bound variables• free (global): variable is not explicitly quantified• bound (free): variable is declared explicitly through
quantification and its scope is the quantified formula
Tuple Relational CalculusRelational Completeness
condition (r):
{ R| r(R) and condition}
ai…,aj(r):
{ R.ai, …, R.aj | r(R)}
r s:
{ T | r(T) or s(T) }
r - s:
{ T | r(T) and not s(T) }
q × r :
{ Q, R | q(Q) and r(R) }
Introduction
- Relational algebra is procedural: it specifies the procedure to be followed in order to get the answer to the query.
- Relational calculus is declarative: it describes (declares) the answer to the query without specifying how to get it.
- Relational calculus strongly resembles First Order Predicate Logic, or simply first order logic.
- There are two variants of relational calculus:
- Tuple relational calculus (TRC)
- Domain relational calculus (DRC)
TUPLE RELATIONAL CALCULUS
- A query statement in TRC is a set declaration having the form:
{ P first-order logic formula}
- This is to be read as ‘the set of tuple variables, P, for which the specified first order logic formula is true’.
- Thus a TRC query is a request (to the DBMS) to produce a set of tuples corresponding to the tuples of the relational answer in SQL.
- Example
Given the following query:
(Q11) Find all sailors with a rating above 7.
The TRC statement of this query is
{S S Sailors S.rating > 7}.
SYNTAX AND SEMANTICS OF TRC
• The syntax and semantics of TRC is that of first-order logic. It is stated quite precisely in the text and there is no need to repeat it here. Instead we shall examine a few query applications.
QUERY Q12(Q12) Find the names and ages of sailors with a rating above 7.
{P S ∃ Sailors (S.rating > 7 P.name = S.name P.age = S.age)}
Remarks
1. The fact that the tuple variable P occurs with two attributes (using the dot notation) means that solely these two attributes are required in the answer relation.
2. The symbols used are the usual first-order logic symbols:
∀: for all : there exists ∃ ⋀ : and : or ¬ : not : implies⋁ ⇒
QUERIES 1,2,7,9,14
The TRC statements for these queries are pretty well self explanatory, especially with the added English statements of how to read them.
DOMAIN RELATIONAL CALCULUS (1)
- The form of a DRC query is as follows:
{<X1, X2, … , Xn> logical DRC formula}
signifying that the system must construct (and output) a set of all the tuples which satisfy the stated logical DRC formula in terms of the n attributes X1, X2, … ,Xn. Thus, the answer is a relational instance with attributes X1, X2, … , Xn, these attributes corresponding to those of some of the relations in the database.
- Again, the approach used by the system is left unspecified.
- The Syntax and the semantics of the DRC are explicitly and precisely described in the text.
DOMAIN RELATIONAL CALCULUS (2)
Example:
(Q11) Find all sailors with a rating above 7.
{ < I, N, T, A > <I, N, T, A > ∈ Sailors ⋀ T > 7 }
Other queries are illustrated and described in the text with all necessary explanation.
EXPRESSIVE POWER OF ALGEBRA AND CALCULUS( 1)
Safety- Certain queries stated in the relational calculus may lead to
answers which contain an infinite number of tuples (or at least as many as the system can handle).
Example:
Consider the TRC query {S ¬(S ∈ Sailors)}. Since there is a quasi-infinite number of tuples that can be created with the attributes of sailors, the answer is (quasi)-infinite.
- A query which yields a (quasi)-infinite answer is said to be unsafe, and, of course, should not be allowed by the system.
- It is possible to define a safe formula in TRC (see text, section 4.4).
LabSessie 2 Tuple Relational Calculus
• A query in a tuple relational calculus is expressed as
(set of all tuples t such that predicate P is true for t)
• Query with constructs “or”, “and”, “there exists”
(there exists a tuple t in relation such that predicate Q(t) is true)
{t| P(t)}
t r(Q(t))
Tuple Relational Calculus• Query with the construct “implies”
(if P is true, then Q must be true)
• Query with the construct “for all”
(Q is true for all tuples t in relation r)
t r(Q(t))
P Q
Tuple Relational Calculus• Relations
• V(d, k): visits(drinker, kroeg)
• S(k,b): servers(kroeg, bier)
• L(d,b): likes(drinker, bier)
• Example (one relation)– We want to have the drinkers and the pubs for visitors
of the pub ‘Café’:
{t|tV t[k]=‘Café’}
{t| P(t)}
V(d, k)S(k,b)L(d,b)
t r(Q(t))
Tuple Relational Calculus• Example (one relation)
– If we want only the drinker attribute rather than all the attribute of the V relation: {t| s V ( t[d] = s[d] s[k]=‘Café’)}
• Example (two relations)– Fin the names of drinkers that likes Duvel
{t| s V ( t[d] = s[d] u L ( s[d] = u[d]
u[b]=‘Duvel’ ))}
V(d, k)S(k,b)L(d,b)
V(d, k)S(k,b)L(d,b)
Tuple Relational Calculus• Example (union)
– If we want all beers (server or liked)
{t | s S ( t[b] = s[b])
u L ( t[b] = u[b]) }
• Example (intersection)– If we want the beers that are both served and liked
{t | s S ( t[b] = s[b])
u L ( t[b] = u[b]) }
V(d, k)S(k,b)L(d,b)
S L
S L
Functional Dependency
• Canonical Cover– Definition
– Fc is a minimal set of the functional dependencies that has the same closure as a given set of functional dependencies
– Application: reducing the effort spent in checking for constraint violation
A canonical cover Fc for F is a set of dependencies such that F logically implies all dependencies in Fc, and Fc logically implies all dependencies in F.
Functional Dependency
• Canonical Cover– Computing Canonical Cover
Fc=FRepeat
• Use the union rule to replace any dependencies in Fc of the form X Y and X Z with X YZ
• Find a functional dependency X Y in Fc with an extraneous attribute either in X or in Y• If an extraneous attribute is found:
delete it from X Y
Until Fc does not change
Extraneous attributeAttribute of a functional dependency that can be removed without changing the closure of the set of functional dependencies
NoteThe test for extraneous attributes is done using Fc, not F
Functional Dependency
• Computing Canonical Cover– Exercise
F={BCD A, BC E, A F, F G, C D, A G}
• Fc = F
• Union rule: A F, A G then A FG
• Fc = {BCD A, BC E, F G, C D, A FG}
Functional Dependency
• Computing Canonical Cover– Exercise
• Fc = {BCD A, BC E, F G, C D, A FG}• D is an extraneous attribute in BCD A
– To prove F |- (F -{BCD A}) {BC A}– Proof C D (given)
C CD (augmentation)BC BCD (augmentation)BCD A (given)BC A (transitivity)
• Fc = {BC A, BC E, F G, C D, A FG}
Functional Dependency
• Computing Canonical Cover– Exercise
• Fc = {BC A, BC E, F G, C D, A FG}• G is an extraneous attribute in A FG
– To prove: (F - {A FG}) {A F} |- F– Proof: F G (given)
F FG (augmentation)A F (given in F)A FG (transitivity)
• Fc = {BC A, BC E, F G, C D, A F}• Union rule: BC A, BC E then BC AE• Fc = {BC AE, F G, C D, A F}