Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use Chapter 8: Top - down Relational Database Design: NORMALIZATION
Database System Concepts, 6th Ed.
©Silberschatz, Korth and Sudarshan
See www.db-book.com for conditions on re-use
Chapter 8: Top-down Relational
Database Design: NORMALIZATION
©Silberschatz, Korth and Sudarshan8.2Database System Concepts - 6th Edition
What can happen when we combine
relations/tables?
Suppose we combine the tables
instructor and department
This creates redundancy (repetition of
information):
©Silberschatz, Korth and Sudarshan8.3Database System Concepts - 6th Edition
What can happen when we combine
relations/tables?
Another problem: UPDATE
When any of the redundant info is changed, the changes
hav eto be applied to multiple tuples!
©Silberschatz, Korth and Sudarshan8.4Database System Concepts - 6th Edition
What can happen when we combine
relations/tables?
Another problem: INSERTION
When a new department is created, there are no
instructors associated with it yet → need to use NULL
values!
©Silberschatz, Korth and Sudarshan8.5Database System Concepts - 6th Edition
Is there any reason to combine
relations/tables?
Any query that involves a natural join
between department and instructor will
execute faster on the combined table!
This is generally preferred in data mining.
©Silberschatz, Korth and Sudarshan8.6Database System Concepts - 6th Edition
The “top-down” approach
In this chapter, we look at the problem in the
opposite direction:
Starting with a large table that contains many
columns and much redundant information, how can
we split (decompose) it into tables with fewer
columns and less redundancy?
©Silberschatz, Korth and Sudarshan8.7Database System Concepts - 6th Edition
Suppose we start with the table inst_dept. How would we
get the idea to decompose it into instructor and
department?
Naïve approach: spot redundancies in data … but it
doesn’t work for two reasons!
Top-down: Decomposition
©Silberschatz, Korth and Sudarshan8.8Database System Concepts - 6th Edition
Problem 1: It’s costly
▪ Real-life DBs can have large amount of data (hundreds
of columns, hundreds of millions of rows)
▪ Spotting redundancies requires consideration of
combinations of elements from a set that is already
large →
▪ → Combinatorial explosion (N-squared and worse)
Spotting redundancies in data
©Silberschatz, Korth and Sudarshan8.9Database System Concepts - 6th Edition
Problem 2: From data alone, it’s impossible to decide
whether a pattern discovered is coincidence or not
Is it the case that departments always reside in one
building and have a unique budget?
Spotting redundancies in data
©Silberschatz, Korth and Sudarshan8.10Database System Concepts - 6th Edition
Solution:
Examine not the data itself (a.k.a. syntax), but the meaning
of the data, a.k.a. the semantics!
The designer must be allowed to specify rules of the
enterprise, a.k.a. functional dependencies, e.g.
dept_name → building, budget
©Silberschatz, Korth and Sudarshan8.11Database System Concepts - 6th Edition
dept_name building, budget
What does it mean?
“If several rows have the same value for dept_name,
then they also have the same values for building and
budget.”
or
“If there were a schema (dept_name, building, budget),
then dept_name would be a candidate key.”
©Silberschatz, Korth and Sudarshan8.12Database System Concepts - 6th Edition
Since in our table inst_dept dept_name is not a candidate key,
the building and budget of a department may have to be repeated
along with dept_name.
▪ This indicates the need to decompose inst_dept.
dept_name building, budget
“If there were a schema (dept_name, building, budget),
then dept_name would be a candidate key.”
©Silberschatz, Korth and Sudarshan8.13Database System Concepts - 6th Edition
This example also shows how functional dependencies (FD) are
different from keys: a FD captures a rule that is in general more
granular than a key.
A key is a FD, but a FD is not always a key!
dept_name building, budget
“If there were a schema (dept_name, building, budget),
then dept_name would be a candidate key.”
©Silberschatz, Korth and Sudarshan8.14Database System Concepts - 6th Edition
Not all decompositions are good!
Suppose we decompose
employee(ID, name, street, city, salary)
into
employee1 (ID, name)
employee2 (name, street, city, salary)
Problem: we cannot reconstruct the original employee relation!
©Silberschatz, Korth and Sudarshan8.15Database System Concepts - 6th Edition
A lossy decomposition
©Silberschatz, Korth and Sudarshan8.16Database System Concepts - 6th Edition
But there are also lossless decompositions!
▪ Technically it’s called a lossless-join decomposition
▪ Decomposition of R = (A, B, C) into
R1 = (A, B) and R2 = (B, C)
A B
1
2
A
B
1
2
r B,C(r)
A (r) B (r)A B
1
2
C
A
B
B
1
2
C
A
B
C
A
B
A,B(r)
©Silberschatz, Korth and Sudarshan8.17Database System Concepts - 6th Edition
How to avoid lossy decompositions?
©Silberschatz, Korth and Sudarshan8.18Database System Concepts - 6th Edition
Goal: Devise a theory for the following
▪ Decide whether a particular relation R is in “good” form.
▪ When the relation R is not in “good” form, decompose
it into a set of relations {R1, R2, ..., Rn} such that
• each relation is in good form
• the decomposition is a lossless-join decomposition
▪ Our theory is based on dependencies:
▪ functional dependencies
▪ multivalued dependencies
▪ The process outlined above is called NORMALIZATION
©Silberschatz, Korth and Sudarshan8.19Database System Concepts - 6th Edition
8.2 First Normal Form (1NF)
A domain is atomic if its elements are treated by the
DBMS as indivisible units.
Examples of non-atomic domains:
Names with first +middle + last
IDs that can be broken up into parts (e.g. CS401)
Phone numbers
Any composite attributes!
A relational schema R is in first normal form (1NF) if
the domains of all attributes of R are atomic.
For now, we assume all relations to be in 1NF (but see
Ch.22: Object-Based Databases)
©Silberschatz, Korth and Sudarshan8.20Database System Concepts - 6th Edition
1NF
Atomicity is actually a property of how the elements of the
domain are used!
Example: Students are given roll numbers which are strings
of the form CS0012 or EE1127
▪ Strings would normally be considered indivisible …
▪ … but if the first two characters are extracted to find the
dept., the domain of roll numbers is not atomic.
▪ Doing so is a bad idea: leads to encoding of information
in the app. program rather than in the DB.
Why is this bad?
What should the DB designer do in this case?
©Silberschatz, Korth and Sudarshan8.21Database System Concepts - 6th Edition
8.3 Functional Dependencies (FD)
▪ FDs are constraints on the set of legal relations.
▪ Require that the value for a certain set of attributes
determine uniquely the value for another set of
attributes.
▪ A FD is a generalization of the concept of key: A key
requires that the value for a certain set of attributes
determine uniquely the value for all remaining
attributes.
©Silberschatz, Korth and Sudarshan8.22Database System Concepts - 6th Edition
Functional Dependencies
Let R be a relation schema, and , two sets of attributes
R and R
The functional dependency
holds on R if and only if for any legal relations r(R), whenever any two tuples t1 and t2 of r agree on the attributes , they also agree on the attributes . That is,
t1[] = t2 [] t1[ ] = t2 [ ]
Example: Consider r (A,B ) with the following instance:
On this instance, A B does NOT hold,
but B A does hold.
A B
©Silberschatz, Korth and Sudarshan8.23Database System Concepts - 6th Edition
QUIZ: Functional Dependencies
Decide if the following FDs hold or not:
a) A B
b) B A
c) {A, C} D
d) {A, B, C} D
©Silberschatz, Korth and Sudarshan8.24Database System Concepts - 6th Edition
FD vs. key
▪ K is a superkey for relation schema R if and only if K R
▪ K is a candidate key for R if and only if
• K R, and
• for no K, R (minimal!)
▪ FDs allow us to express constraints that cannot be expressed
using (super)keys. Consider the schema:
inst_dept (ID, name, salary, dept_name, building, budget )
We expect these FDs to hold:
dept_name building, ID building
but would not expect the following FD to hold:
dept_name salary
©Silberschatz, Korth and Sudarshan8.25Database System Concepts - 6th Edition
QUIZ: FD vs. key
Decide if the following are super/candidate keys:
a) A
b) B
c) {A, C}
d) {A, B, C}
e) {A, B, C, D}
f) D
©Silberschatz, Korth and Sudarshan8.26Database System Concepts - 6th Edition
Uses for FDs
▪ Test relations to see if they are legal under a given set of
FDs.
▪ If a relation r is legal under a set F of FDs, we say that r
satisfies F.
▪ Specify constraints on the set of legal relations
▪ We say that F holds on R if all legal relations on R satisfy
the set of FDs F.
Note: A specific instance of a relation schema may satisfy a FD
even if the FD does not hold on all legal instances.
• Example: a specific instance of instructor may, by chance,
satisfy
name ID.
©Silberschatz, Korth and Sudarshan8.27Database System Concepts - 6th Edition
Trivial FD
A functional dependency is trivial if it is satisfied
by all instances of a relation
Example:
ID, name ID
name name
In general, is trivial if
©Silberschatz, Korth and Sudarshan8.28Database System Concepts - 6th Edition
QUIZ: Trivial FDs
Give 4 examples of trivial FDs in this relation.
©Silberschatz, Korth and Sudarshan8.29Database System Concepts - 6th Edition
solution
Trivial FDs in this relation:
• A → A
• B → B
• C → C
• D → D
• AB → A
• AB → AB
• BC → C
. . . . .
• ABCD → ABCD
©Silberschatz, Korth and Sudarshan8.30Database System Concepts - 6th Edition
The Holy Grail: The closure of a set of FDs
▪ Given a set F of FDs, there are certain other FDs
that are logically implied by F.
Example: If A B and B C, then we can infer that A C
▪ The set of all FDs logically implied by F is the
closure of F.
▪ We denote the closure of F by F+.
▪ F+ is a superset of F.
©Silberschatz, Korth and Sudarshan8.31Database System Concepts - 6th Edition
8.3.2 Boyce-Codd Normal Form
▪ is trivial (i.e., )
▪ is a superkey for R
A relation schema R is in BCNF with respect to a set F of
FDs if, for all FDs in F+ of the form
(where R and R), at least one of the following is
true:
©Silberschatz, Korth and Sudarshan8.32Database System Concepts - 6th Edition
QUIZ: BCNF
is trivial (i.e., )
is a superkey for R
at least one of the following holds:
Is this schema in BCNF?
instr_dept (ID, name, salary, dept_name, building, budget )
©Silberschatz, Korth and Sudarshan8.33Database System Concepts - 6th Edition
solution
is trivial (i.e., )
is a superkey for R
at least one of the following holds:
Is this schema in BCNF?
instr_dept (ID, name, salary, dept_name, building, budget )
No, because
dept_name building, budget
holds, but dept_name is not a superkey (Why?)
©Silberschatz, Korth and Sudarshan8.34Database System Concepts - 6th Edition
Extra-credit QUIZ
EOL1/3
©Silberschatz, Korth and Sudarshan8.35Database System Concepts - 6th Edition
Quiz:
What does the acronym FD stand for?
What is the difference between keys and FDs?
©Silberschatz, Korth and Sudarshan8.36Database System Concepts - 6th Edition
Quiz:
What does the acronym FD stand for?
• Functional Dependency
What is the difference between keys and FDs?
• A key is a FD, but a FD is not always a key!
• In a key, the LHS implies all attributes, but in a FD it
implies only some attributes.
©Silberschatz, Korth and Sudarshan8.37Database System Concepts - 6th Edition
Quiz:
What is meant by saying that an attribute (or set of
attributes) implies another attribute (or set of
attributes) ?
©Silberschatz, Korth and Sudarshan8.38Database System Concepts - 6th Edition
Quiz:
What is meant by saying that an attribute (or set of
attributes) implies another attribute (or set of
attributes) ?
©Silberschatz, Korth and Sudarshan8.39Database System Concepts - 6th Edition
Quiz:
What does the acronym BCNF stand for?
What is the definition of a table/relation being in
BCNF?
©Silberschatz, Korth and Sudarshan8.40Database System Concepts - 6th Edition
Quiz:
What does the acronym BCNF stand for?
• Boyce-Codd Normal Form
What is the definition of a table/relation being in
BCNF?
©Silberschatz, Korth and Sudarshan8.41Database System Concepts - 6th Edition
QUIZ
©Silberschatz, Korth and Sudarshan8.42Database System Concepts - 6th Edition
SOLUTION
No, because the FD BC violates
the BCNF definition: neither is the
FD trivial, not is B a (super)key.
The fact that B is not in general a
superkey can be proved with an
example of an instance:
©Silberschatz, Korth and Sudarshan8.43Database System Concepts - 6th Edition
QUIZ
What is the FD BA is
added to the set F?
Is (R, F) now in BCNF?
©Silberschatz, Korth and Sudarshan8.44Database System Concepts - 6th Edition
QUIZ
What is the FD BA is
added to the set F?
Is (R, F) now in BCNF?
A: Yes, b/s B is now a superkey.
©Silberschatz, Korth and Sudarshan8.45Database System Concepts - 6th Edition
BCNF Decomposition
Suppose we have a schema R and a non-trivial dependency causes a violation of BCNF.
We decompose R into:
• ( U )
• ( R - ( - ) )
©Silberschatz, Korth and Sudarshan8.46Database System Concepts - 6th Edition
Example BCNF decomposition
• ( U )
• ( R - ( - ) )
In our example:
= dept_name
= building, budget
so we decompose into:
( U ) = ( dept_name, building, budget )
( R - ( - ) ) = ( ID, name, salary, dept_name )
©Silberschatz, Korth and Sudarshan8.47Database System Concepts - 6th Edition
QUIZ 1: BCNF
We decompose R into:
• ( U )
• ( R - ( - ) )
Take = {A, B, C, D} = {C, D, E, F}, and the entire relation is R = {A,B,C,D,E,F,G,H}
What is the decomposition?
©Silberschatz, Korth and Sudarshan8.48Database System Concepts - 6th Edition
QUIZ 2: BCNF
We decompose R into:
• ( U )
• ( R - ( - ) )
Take = {A, B} = {E, F}, and the entire relation is R = {A,B,C,D,E,F,G,H}
What is the decomposition?
©Silberschatz, Korth and Sudarshan8.49Database System Concepts - 6th Edition
QUIZ 3: BCNF
Is this relation in BCNF?
Hint: Rename the attributes A, B, C, ….
©Silberschatz, Korth and Sudarshan8.50Database System Concepts - 6th Edition
QUIZ 3: BCNF
A: Not BCNF, b/c both FDs are violations!
Decompose it to BCNF!
©Silberschatz, Korth and Sudarshan8.51Database System Concepts - 6th Edition
QUIZ 3: BCNF
Solution:
©Silberschatz, Korth and Sudarshan8.52Database System Concepts - 6th Edition
Dependency Preservation
Constraints, including FDs, are costly to check in
practice unless they pertain to only one relation.
If it is sufficient to test only those dependencies on each
individual relation of a decomposition in order to
ensure that all functional dependencies hold, then
that decomposition is dependency preserving.
©Silberschatz, Korth and Sudarshan8.53Database System Concepts - 6th Edition
BCNF and Dependency Preservation
ER model of a bank: A
customer can have
more than 1 personal
banker, but at most
one at any given
branch.
A ternary relationship-
set is needed:
©Silberschatz, Korth and Sudarshan8.54Database System Concepts - 6th Edition
BCNF and Dependency Preservation
Implementation:
R = cust_banker_branch = (customer_id, employee_id,
branch_name, type)
FDs: FD1: employee_id branch_name
FD2: (customer_id, branch_name) (employee_id, type)
Is cust_banker_branch in BCNF?
©Silberschatz, Korth and Sudarshan8.55Database System Concepts - 6th Edition
BCNF and Dependency Preservation
Implementation:
R = cust_banker_branch = (customer_id, employee_id,
branch_name, type)
FDs: FD1: employee_id branch_name
FD2: (customer_id, branch_name) (employee_id, type)
Apply the decomposition algorithm!
©Silberschatz, Korth and Sudarshan8.56Database System Concepts - 6th Edition
BCNF and Dependency Preservation
Implementation:
R = cust_banker_branch = (customer_id, employee_id,
branch_name, type)
FDs: FD1: employee_id branch_name
FD2: (customer_id, branch_name) (employee_id, type)
Decomposition:
R1 = (employee_id, branch_name)
R2 = (customer_id, employee_id, type)
Problem: FD2 is now “spread” across
two relations!
©Silberschatz, Korth and Sudarshan8.57Database System Concepts - 6th Edition
BCNF and Dependency Preservation
Conclusion:
BCNF is not dependency preserving (in
general)
Because it is not always possible to achieve both
BCNF and dependency preservation, we consider a
weaker normal form …
©Silberschatz, Korth and Sudarshan8.58Database System Concepts - 6th Edition
Third Normal Form = 3NF
A relation schema R is in third normal form (3NF) if for all:
in F+
at least one of the following holds:
is trivial (i.e., )
is a superkey for R
Each attribute A in – is contained in a candidate key for R.
(NOTE: each attribute may be in a different candidate key)
If a relation is in BCNF it is in 3NF (since in BCNF one of the first two
conditions above must hold).
Third condition is a minimal relaxation of BCNF to ensure dependency
preservation.
©Silberschatz, Korth and Sudarshan8.59Database System Concepts - 6th Edition
SKIP all other 3NF theory!
The only facts about 3NF we cover are
those on the previous slide!
©Silberschatz, Korth and Sudarshan8.60Database System Concepts - 6th Edition
Whatever happened with 2NF?
In a nutshell, it forbids attributes to depend on parts of keys.
It is not of practical use anymore.
See Second normal form - Wikipedia, the free encyclopedia
for more details.
©Silberschatz, Korth and Sudarshan8.61Database System Concepts - 6th Edition
Review of Normal Forms
©Silberschatz, Korth and Sudarshan8.62Database System Concepts - 6th Edition
Updated list of Normalization Goals
Let R be a relation scheme with a set F of FDs:
▪ Decide whether R is in “good” form.
▪ If R is not in “good” form, decompose it into a set of relation
schemes {R1, R2, ..., Rn} such that :
• each relation scheme is in good form
• the decomposition is a lossless-join decomposition
• Preferably, the decomposition should be dependency
preserving.
EOL 2/3
©Silberschatz, Korth and Sudarshan8.63Database System Concepts - 6th Edition
QUIZ: BCNF and 3NF
Consider the following relation:
What non-trivial FDs exist?Hint: Rename the attributes A, B, C.
Source: http://en.wikipedia.org/wiki/Boyce-Codd_normal_form
©Silberschatz, Korth and Sudarshan8.64Database System Concepts - 6th Edition
QUIZ continued
F1: Person, Shop Type → Nearest Shop
F2: Nearest Shop → Shop Type
Source: http://en.wikipedia.org/wiki/Boyce-Codd_normal_form
Is the relation in BCNF?
Simplified notation:
AB → C
C → B.
A B C
©Silberschatz, Korth and Sudarshan8.65Database System Concepts - 6th Edition
No, b/c C → B is a violation: C is not superkey.
Source: http://en.wikipedia.org/wiki/Boyce-Codd_normal_form
Is the relation in 3NF?
QUIZ continued
Simplified notation:
AB → C
C → B.
©Silberschatz, Korth and Sudarshan8.66Database System Concepts - 6th Edition
No, b/c C → B is a violation: C is not superkey.
Source: http://en.wikipedia.org/wiki/Boyce-Codd_normal_form
Remember: 3NF has the following condition in addition to BCNF:
Each attribute A in – is contained in a candidate key for R.(NOTE: each attribute may be in a different candidate key)
QUIZ continued
Simplified notation:
AB → C
C → B.
©Silberschatz, Korth and Sudarshan8.67Database System Concepts - 6th Edition
Do the BCNF decomposition
Source: http://en.wikipedia.org/wiki/Boyce-Codd_normal_form
B is part of the candidate key AB.
This shows that C → B is not a 3NF violation, so the relation is
in 3NF!
QUIZ continued
Simplified notation:
AB → C
C → B.
©Silberschatz, Korth and Sudarshan8.68Database System Concepts - 6th Edition
Is this decomposition dependency-preserving?
Source: http://en.wikipedia.org/wiki/Boyce-Codd_normal_form
R1 = {B, C} R2 = {A, C}
QUIZ continued
Simplified notation:
AB → C
C → B.
©Silberschatz, Korth and Sudarshan8.69Database System Concepts - 6th Edition
Is this decomposition dependency-preserving?
No, b/c AB → C is “spread” across the two relations.
Source: http://en.wikipedia.org/wiki/Boyce-Codd_normal_form
R1 = {B, C} R2 = {A, C}
QUIZ continued
Simplified notation:
AB → C
C → B.
©Silberschatz, Korth and Sudarshan8.70Database System Concepts - 6th Edition
8.4 Functional-Dependency Theory
This is the formal theory that tells us which FDs are
implied logically by a given set of FDs.
▪ Given a set F of FDs, there are certain other FDs that
are logically implied by F.
E.g. transitivity: If A B and B C, then also A C
▪ The set of all functional dependencies logically
implied by F is the closure of F, denoted F +.
©Silberschatz, Korth and Sudarshan8.71Database System Concepts - 6th Edition
Armstrong’s
Axioms
We can find F+, the closure of F, by repeatedly applying
Armstrong’s Axioms:
if , then (reflexivity)
if , then (augmentation)
if , and , then (transitivity)
These rules are
sound (They generate only FDs that actually hold)
complete (They generate all FDs that hold).
William Ward Armstrong is a Canadian
mathematician and computer scientist.
He presented what became known as his
axioms in a 1974 paper.
©Silberschatz, Korth and Sudarshan8.72Database System Concepts - 6th Edition
Examples of use of A’s Axioms
Given the following relation: R = (A, B, C, G, H, I)
and the set of FDs F = { A B
A C
CG H
CG I
B H}
Some other members of the closure F+ are:
A H
by transitivity from A B and B H
AG I
by augmenting A C with G, to get AG CG
and then transitivity with CG I
CG HI
by augmenting CG I to infer CG CGI,
and augmenting of CG H to infer CGI HI,
and then transitivity
©Silberschatz, Korth and Sudarshan8.73Database System Concepts - 6th Edition
Your turn!
• if , then (reflexivity)
• if , then (augmentation)
• if , and , then (transitivity)
Prove that
if and only if
Double implication:
L-to-R and R-to-L!
©Silberschatz, Korth and Sudarshan8.74Database System Concepts - 6th Edition
Solution
• if , then (reflexivity)
• if , then (augmentation)
• if , and , then (transitivity)
Prove that
(use augmentation)
(use reflexivity and
transitivity)
©Silberschatz, Korth and Sudarshan8.75Database System Concepts - 6th Edition
QUIZ: Armstrong’s Axioms
Write Armstrong’s Axioms:
• (reflexivity)
• (augmentation)
• (transitivity)
©Silberschatz, Korth and Sudarshan8.76Database System Concepts - 6th Edition
Solution
Write Armstrong’s Axioms:
• if , then (reflexivity)
• if , then (augmentation)
• if , and , then (transitivity)
©Silberschatz, Korth and Sudarshan8.77Database System Concepts - 6th Edition
More FD theorems, a.k.a. rules or results
Exercise 8.26
Exercise 8.4
Exercise 8.5
©Silberschatz, Korth and Sudarshan8.78Database System Concepts - 6th Edition
Naïve Algorithm for Computing F+
Repeatedly many different axioms and theorems to derive new FDs!
Can you find 3 more FDs in this manner?
Do you see a problem with this approach?
©Silberschatz, Korth and Sudarshan8.79Database System Concepts - 6th Edition
Extra-credit QUIZ
©Silberschatz, Korth and Sudarshan8.80Database System Concepts - 6th Edition
Algorithm for Computing F+
To compute the closure F+ of a set of FDs F:
Assign F+ = F
repeat
for each FD f in F+
apply reflexivity and augmentation rules on f
add the resulting FDs to F +
for each pair of FDs f1and f2 in F +
if f1 and f2 can be combined using transitivity,
add the resulting FD to F +
until F + does not change any further
Reflexivity only to the RHS of f
©Silberschatz, Korth and Sudarshan8.81Database System Concepts - 6th Edition
QUIZ: Apply the algorithm to
R = {A, B, C, D, E, F},
with F = {AB → C, B →D, CD →F}
Assign F+ = F
repeat
for each FD f in F+
apply reflexivity and augmentation rules on f
add the resulting FDs to F +
for each pair of FDs f1and f2 in F +
if f1 and f2 can be combined using transitivity,
add the resulting FD to F +
until F + does not change any further
Reflexivity only to the RHS of f
©Silberschatz, Korth and Sudarshan8.82Database System Concepts - 6th Edition
Closure of Attribute Sets
Since computing the entire closure F+ is in general a
formidable task, we set ourselves first a more modest
goal:
▪ Given a set of attributes , define the closure of
under F, denoted by + : it is the set of attributes
that are functionally determined by under F
©Silberschatz, Korth and Sudarshan8.83Database System Concepts - 6th Edition
Closure of Attribute Sets
This is the algorithm to compute +, the closure of
under F:
result := ;
while (changes to result) do
for each in F do
begin
if result then result := result
end
©Silberschatz, Korth and Sudarshan8.84Database System Concepts - 6th Edition
Example
of
Attribute
Set
Closure
R = (A, B, C, G, H, I)
F = {A BA C CG HCG IB H}
(AG)+
1. result = AG
2. result = ABCG (A C and A B)
3. result = ABCGH (CG H and CG AGBC)
4. result = ABCGHI (CG I and CG AGBCH)
Is AG a candidate key?
1. Is AG a super key?
1. Does AG R? Yes, b/c (AG)+ = R.
2. Is any subset of AG a superkey?
1. Does A R? No, b/c (A)+ ≠ R
2. Does G R? No, b/c (G)+ ≠ R
Stop b/c
there are no
changes!
EOL 3
©Silberschatz, Korth and Sudarshan8.85Database System Concepts - 6th Edition
QUIZ: 3NF
A relation schema R is in third normal form (3NF) if for all:
in F+
at least one of the following holds:
is trivial (i.e., )
is a superkey for R
Each attribute A in – is contained in a candidate key for R.
(NOTE: each attribute may be in a different candidate key)
books (Book-Name, Editor, A-Name, A-SSN, Nr-pag)
FD: A_Name A_SSN
Is it in BCNF? 3NF?
©Silberschatz, Korth and Sudarshan8.86Database System Concepts - 6th Edition
Uses of Attribute Closure Algorithm
Testing for superkey:
To test if is a superkey, we compute +, and check
if + contains all attributes of R
Testing if a certain FDs holds:
To check if holds (is in F+), just check if +
Another algorithm for computing closure F+ of F:
For each R, find the closure +
for each S +, we output the FD S
Still very expensive, but at least
we have a more systematic way
of doing it!
©Silberschatz, Korth and Sudarshan8.87Database System Concepts - 6th Edition
QUIZ: Uses of Attribute Closure Alg.
Testing for superkey:
To test if is a superkey, we compute +, and
check if + contains all attributes of R
R = (A, B, C, D)
F = {A BC, B C, A B, AB C, BC → D}
Is AD a superkey?
Is AD a candidate key?
©Silberschatz, Korth and Sudarshan8.88Database System Concepts - 6th Edition
QUIZ: Uses of Attribute Closure Alg.
Testing if a certain FDs holds:
To check if holds (is in F+), just check if
+
R = (A, B, C, D)
F = {A BC, B C, A B, AB C, BC → D}
Does AC → D hold ?
©Silberschatz, Korth and Sudarshan8.89Database System Concepts - 6th Edition
SKIP:
-- the remainder of Section 8.4
-- 8.5
©Silberschatz, Korth and Sudarshan8.90Database System Concepts - 6th Edition
8.6 Multivalued dependencies
First let’s go back and cover the example from 8.3.5
©Silberschatz, Korth and Sudarshan8.91Database System Concepts - 6th Edition
8.3.5 The need for normal forms
beyond BCNF
There are database schemas in BCNF that still do not seem to
be sufficiently normalized!
Example: Consider the relation
inst_info (ID, child_name, phone)
where an instructor can have multiple phone nrs. and multiple
children:
ID child_name phone
99999
99999
99999
99999
David
David
William
Willian
512-555-1234
512-555-4321
512-555-1234
512-555-4321
©Silberschatz, Korth and Sudarshan8.92Database System Concepts - 6th Edition
There are no non-trivial functional dependencies and therefore
the relation is in BCNF. But we have:
▪ Redundancy
▪ Insertion anomalies: if we add a phone 981-992-3443 to
99999, we need to add two tuples:
(99999, David, 981-992-3443)
(99999, William, 981-992-3443)
ID child_name phone
99999
99999
99999
99999
David
David
William
Willian
512-555-1234
512-555-4321
512-555-1234
512-555-4321
Beyond BCNF?
©Silberschatz, Korth and Sudarshan8.93Database System Concepts - 6th Edition
It is better to decompose inst_info into:
This suggests the need for higher normal forms, such as Fourth Normal Form
(4NF - later.
ID child_name
99999
99999
99999
99999
David
David
William
Willian
inst_child
ID phone
99999
99999
99999
99999
512-555-1234
512-555-4321
512-555-1234
512-555-4321
inst_phone
Beyond BCNF?
©Silberschatz, Korth and Sudarshan8.94Database System Concepts - 6th Edition
Back to 8.6 Multivalued dependencies
Read the similar example
inst (ID, dept name, name, street, city)
©Silberschatz, Korth and Sudarshan8.95Database System Concepts - 6th Edition
8.6.1 Multivalued dependencies
FDs rule out certain tuples from being in the relation:
if A → B, we cannot have tuples with the same A value but
different B values.
▪ For this reason, FDs are also referred to as equality-
generating dependencies
Multivalued dependencies (MVDs) do not rule out - instead,
they require that other tuples be present in the relation.
▪ MVDs are also referred to as tuple-generating
dependencies.
©Silberschatz, Korth and Sudarshan8.96Database System Concepts - 6th Edition
Formal definition of MVDs
Example on next slide
©Silberschatz, Korth and Sudarshan8.97Database System Concepts - 6th Edition
OK, this is not so hard after all:
It simply says that the “cross” tuples must also be present!
©Silberschatz, Korth and Sudarshan8.98Database System Concepts - 6th Edition
Our text has another intuitive interpretation:
The relationship between and is independent of the
relationship between and R−.
©Silberschatz, Korth and Sudarshan8.99Database System Concepts - 6th Edition
MVD example
An instructor can be associated with multiple departments and a
department may have several instructors, so:
• ID → dept_name does not hold
• dept_name → ID does not hold
Therefore, r2 has no FDs, therefore it is in BCNF.
©Silberschatz, Korth and Sudarshan8.100Database System Concepts - 6th Edition
MVD example
Despite r2 being in BCNF, there is redundancy: we repeat the
address information of each residence of an instructor once for each
department with which the instructor is associated.
What MVD can you spot?
©Silberschatz, Korth and Sudarshan8.101Database System Concepts - 6th Edition
solution
Actually, there are two MVDs:
It can be easily proved that
©Silberschatz, Korth and Sudarshan8.102Database System Concepts - 6th Edition
Extra-credit QUIZ: MVD
©Silberschatz, Korth and Sudarshan8.103Database System Concepts - 6th Edition
SKIP the remainder of Section 8.6:
-- 8.6.2 Fourth Normal Form
-- 8.6.3 4NF Decomposition
©Silberschatz, Korth and Sudarshan8.104Database System Concepts - 6th Edition
8.8 Overall DB Design Process
We have assumed that the schema R is given, but how does R
appear in practice?
▪ R can be generated when converting E-R diagram to a set
of tables.
▪ R can be a single relation containing all attributes that are of
interest (called universal relation). Normalization then
breaks R into smaller relations.
or
▪ R can be the result of some ad hoc design of relations,
which we then test/convert to a normal form.
©Silberschatz, Korth and Sudarshan8.105Database System Concepts - 6th Edition
ER Model and Normalization
When an E-R diagram is carefully designed, identifying all entities
correctly, the tables generated from the E-R diagram should not need
further normalization.
However, in a real, imperfect design, there can be:
▪ FDs from non-key attributes of an entity set to other attributes of
the same entity set, e.g.:
• employee entity with attributes including department_name
and building, and the FD department_name building
• Good design would have made department a separate entity
▪ FDs from non-key attributes of a relationship set to other attrib.
▪ It’s possible, but rare, since most relationships are binary.
©Silberschatz, Korth and Sudarshan8.106Database System Concepts - 6th Edition
Naming of Attributes and Relationships
▪ Unique-role: each attribute name has a unique meaning
in the DB
▪ Bad names: number, id, name
▪ Although technically the order of attribute names in a
schema does not matter, it is convention to list primary-
key attributes first.
▪ Naming tables that reduce ER relationship sets – two
alternatives:
1. concatenate of the names of related entity sets, with
a hyphen or underscore, e.g. student_sec.
©Silberschatz, Korth and Sudarshan8.107Database System Concepts - 6th Edition
Naming of Attributes and Relationships
▪ Naming tables that reduce ER relationship sets – two
alternatives:
2. In some cases concatenation is impossible or doesn’t
make sense → choose a new, descriptive name:
Relationship between two roles of the same entity
(e.g. manager is better than employee_employee)
Multiple relationships between the same pair of
entities.
©Silberschatz, Korth and Sudarshan8.108Database System Concepts - 6th Edition
De-normalization for performance
In certain query-intensive applications (e.g. data
mining), we may want to use a non/de-normalized
schema to improve the performance of our queries
Example:
▪ if we want the course ID and title of a course,
along with the IDs of its prerequisites, we need to
join the tables course and prereq.
▪ if we run this query often, it may be more efficient
to trade off space for time
©Silberschatz, Korth and Sudarshan8.109Database System Concepts - 6th Edition
De-normalization for performance
▪ Alternative 1: Create a de-normalized relation/table
containing only the attributes we need: course_id,
title, prereq_id.
• Plus: faster lookup
• Minus: extra space and execution time for updates
• Minus: extra coding work and possibility of errors in
the added code
▪ Alternative 2: use a materialized view defined as
course prereq
• Same as above, except we don’t have the second
minus.
©Silberschatz, Korth and Sudarshan8.110Database System Concepts - 6th Edition
De-normalization for Performance
“Normalize until is hurts,
then de-normalize until it works!”
☺
©Silberschatz, Korth and Sudarshan8.111Database System Concepts - 6th Edition
Other Design Issues
Some aspects of DB design are not caught by normalization.
Examples of bad DB design, to be avoided: Instead of
earnings (company_id, year, amount ), someone has
created:
Separate tables: earnings_2004, earnings_2005,
earnings_2006, etc. All these tables are in BCNF, but:
querying across years is difficult
a new table needs to be created each year
©Silberschatz, Korth and Sudarshan8.112Database System Concepts - 6th Edition
Other Design Issues
Examples of bad DB design, to be avoided: Instead of
earnings (company_id, year, amount ), someone has
created:
One table, but with a separate column for each year:
company_year (company_id, earnings_2004,
earnings_2005, earnings_2006)
It’s also in BCNF, but also makes querying across years
difficult and requires a new column each year.
Is an example of a crosstab, where values for one
attribute become column names
Used in spreadsheets, and other data analysis tools
©Silberschatz, Korth and Sudarshan8.113Database System Concepts - 6th Edition
SKIP
8.9 Modeling Temporal Data
©Silberschatz, Korth and Sudarshan8.114Database System Concepts - 6th Edition
Chapter 8 - what sections we covered:
8.1 Features of Good Relational
Design
8.2 Atomic Domains and First
Normal Form
8.3 Decomposition Using Functional
Dependencies
8.4 Functional Dependency Theory
8.4.1 Closure of a Set of FDs
8.4.2 Closure of Attribute Sets
8.4.3 Canonical cover
8.4.4 Lossless decomposition
8.4.5 Dependency preservation
8.5 Algorithms for decomposition
8.6 Decomposition Using
Multivalued Dependencies
8.6.1 Multivalued Dependencies
8.6.2 Fourth normal form
8.6.3 4NF Decomposition
8.7 More normal Forms
8.8 Database Design Process
8.9 Modeling Temporal Data
©Silberschatz, Korth and Sudarshan8.115Database System Concepts - 6th Edition
Homework for Ch.8▪ 8.4, 8.5
▪ 8.6 → Derive only 7 new FDs, using the Algorithm for
Computing F+ (Fig. 8.7 in text)
▪ 8.26
▪ 8.29 → only (a) (b), then
• Use the attribute closure algorithm for each of the
LHSs of the four FDs
• (e) Take the first FD that violates BCNF and perform
BCNF decomposition
• Is the BCNF decomposition found above dependency
preserving?
• Skip (c), (d), and (f)!
• 8.33 (Hint: Use the “cross-tuple” interpretation!)EOL 4
©Silberschatz, Korth and Sudarshan8.116Database System Concepts - 6th Edition
The next slides are a collection of the
algorithms we need to know from this chapter
©Silberschatz, Korth and Sudarshan8.117Database System Concepts - 6th Edition
3NF
If for all FDs in F+
at least one of the following holds:
is trivial (i.e., )
is a superkey for R
Each attribute A in – is contained in
a candidate key for R. (NOTE: each
attribute may be in a different cand. key!)
BCNF
©Silberschatz, Korth and Sudarshan8.118Database System Concepts - 6th Edition
Decomposing a Schema into BCNF
Suppose we have a schema R and a non-trivial dependency causes a violation of BCNF.
We decompose R into:
• R1 = ( U ), F1 = (…)
• R2 = ( R - ( - ) ), F2 = (…)
Usually we also want to know if the decomposition is dependency-preserving!
©Silberschatz, Korth and Sudarshan8.119Database System Concepts - 6th Edition
Algorithm for F+
Assign F+ = F
repeat
for each FD f in F+
apply reflexivity and augmentation rules on f
add the resulting FDs to F +
for each pair of FDs f1and f2 in F +
if f1 and f2 can be combined using transitivity,
add the resulting FD to F +
until F + does not change any further
Reflexivity only to the RHS of f
©Silberschatz, Korth and Sudarshan8.120Database System Concepts - 6th Edition
Algorithm for +
result := ;
while (changes to result) do
for each in F do
begin
if result then result := result
end