Normal Forms
Normal Forms
First Normal Form:
• all table cells must contain atomic values
− no sets, arrays, lists, or other collection types
− no structured objects
• all relational databases satisfy first normal form by definition
− the definition restricts attribute values to singletons from specified domains
• indeed, relations that do not satisfy first normal form arespecifically called unnormalized relations
Definitions:
denotes a relation; ( ) is the corresponding body
is a functional dependency
The universal relation
contains all application attributes
( ) contains join of all database tables
R r R
X Y
R
r R
( ) is valid if it satisfies all FDs implied by the
application constraints
r R
Goal of functional dependency analysis is to decompose theuniversal relation into components which cannot becomeinconsistent with respect to any functional dependencyconstraint
Some requirements:
1
2
1 2
1
2
1 2
Suppose
For any valid ( ), must have
( ) ( ( ))
( ) ( ( ))
( ) ( ) ( )
R
R
R R R
r R
r R r R
r R r R
r R r R r R
+ indicates union of attributes
lossless join
1 2Note: [ ( ( ))] [ ( ( ))] ( )
Perhaps join contains too much ....
spurious tuples ...
R Rr r R r r R r R
A B C
1 2 35 2 4
A B
1 25 2
B C
2 32 4
A B C
1 2 31 2 45 2 35 2 4
A,BB,C
spurious tuples
Lossy join:
More definitions:
1 2
1 2
{ | ( ) is imposed by application}
{ | application FDs imply }
is a decomposition
( ) applies to if
is a valid decomposition if
(1) it i
m
i i
m
X Y X Y
X Y X Y
R R R R
X Y R X Y R
R R R R
F
F
F
s lossless
(2) if ( ) satisfies and ( ) applies to ,
then ( ) satisfies i
i
r R X Y R
r R X Y
F F
Whence functional dependencies:
Analysis of application
Closure algorithm
Minimal cover algorithm '
F
F
F
F
F
'F
Trick: compute minimal cover directly from application FDs without calculating closure
Yet more definitions:
+
Let be the universal relation
is a set of functional dependencies
We say that implies ( ) if the following condition holds:
( ) satisfies ( ) satisfies ( )
{ | implies (
R
X Y
r R r R X Y
X Y X
F
F
F
F F
1 2
1 2 1 2 1 2
)} is the closure of
Let , be sets of functional dependencies
, are equivalent, written , if
Y
F
F F
F F F F F F
2 1 2 1
2
2 2 2
2
is a minimal cover of if and
The right side of every FD in contains a single attribute
If ( ) , then [ { }]
That is, no FD in is redundant
I
X A X A
F F F F
F
F F F
F
2 2 2
2
f ( ) , then [ ( { }) { }]
That is, no FD in has a redundant attribute on its left side
is minimal if it is a minimal cover of itself
XA B XA B X B
F F F
F
F
Yet more definitions (continued):
1. Reflexivity: If , then ( ) (regardless of )
2. Augmentation: If ( ) and , then ( )
3. Transitivity: If ( ) and ( ) , then ( )
4. Composition: If ( ) and (
Y X R X Y
X Y Z R XZ YZ
X Y Y Z X Z
X Y X
F F
F F
F F F
F ) , then ( )
5. Decomposition: If ( ) , then ( ) and ( )
6. Pseudotransitivity: If ( ) and ( ) , then ( )
Z X YZ
X YZ X Y X Z
X Y WY Z WX Z
F F
F F F
F F F
Armstrong's Axioms
Theorem: Armstrong's Axioms are both sound and complete.
That is, if you apply them repeatedly to a given set of FDs, then
(1) every new FD generated belongs to the closure (sound)
(2) every FD in the closure will eventually be generated (complete)
Definition:
( ; ) { | ( ) follows from via Armstrong's Axioms}X A X A F F
Because inferences via Armstrong's Axioms yields precisely theFD set closure, this definition is the same as:
( ; ) { | ( ) }X A X A F F
Generating the closure is computationally expensive because of itsexponential size
But, checking a given FD for inclusion in the closure is easy:
( ) iff ( ; )X Y Y X F F
Algorithm uses only the application-driven FDs, not the closure
attributeSet maxRight (attributeSet , FDSet ) {
attributeSet ;
boolean changed = true;
while (changed)
if [ ( ) with and ]
;
else
X
Z X
U V U Z V Z
Z Z V
F
F
changed = false;
return ;
}
Z
attributeSet FDinClosure (attributeSet , attributeSet , FDSet ) {
/* returns true if ( ) */
return [ maxRight( , )];
}
X Y
X Y
Y X
F
F
F
Other uses of the algorithm: key determination
is a superkey for iff ( ) for every
is a key for iff is a superkey and for all ,
( ) is not a superkey
Algorithm: For each subset , use maxRight procedure to
con
X R X A A R
X R X B X
X B
X R
F
struct ( ; ). If ( ; ) , then is a superkey; otherwise,
it is not. A superkey that does not properly contain another superkey
is a key.
X X R XF F
Other uses of the algorithm: equivalence of two sets of FDs
1 2 1 2 2 1
1
2 2
iff and
Algorithm: for each ( ) , use maxRight to
determine if ( ) . Similarly, for FDs in .
X Y
X Y
2
F F F F F F
F
F F
Other uses of the algorithm: minimal cover computation:
1
mutate application-driven into an equivalent FD set
that satisfies the minimality conditions.
Algorithm:
FDSet minCover (FDSet ) {
FDSet = ;
boolean change
F
F
F
1 1
2 = true, change3 = true;
for each ( )
for each
= { };
while (change2 || change3) {
change2 = false;
while
X Y
A Y
X A
F
F F
1 1 1
1 1
1 1 1
[ ( ) with { }]{
change2 = true;
{ };
}
change3 = false;
while [ ( ) with ( {
X A X A
X A
XB A XB
F F F
F =F
F F F
1 1
1
}) { }]{
change3 = true;
( { }) { };
}
return ;
}
A X A
XB A X A
F F
F
Boyce-Codd Normal Form (BCNF):
1 2
An FD ( ) is called trivial if
Let be a lossless decomposition with respect to
satisfies BCNF if, for every nontrivial FD ( ) that applies to ,
is a superkey o
n
i i
X Y Y X
R R R R
R X A R
X
F
F
f
The decomposition satisfies BCNF if every component satisfies BCNF
There exists an algorithm to decompose any universal relation and an
associated application-driven set of FDs (or a
i
i
R
R
R
F n equivalent minimal
cover) into BCNF
The algorithmic decomposition is lossless, but unfortunately, it is not always
dependency-preserving
Dependency preservation:
1 2
1
1
Let be a lossless decomposition with respect to
Define ( ) { | ( ) and ( ) applies to }
Decomposition is dependency-preserving if ( )
Certainly, ( )
n
i i
n
ii
n
ii
R R R R
X Y X Y X Y R
F
F F
F F
F +, but is potentially larger (stronger)
F F
Example: BCNF decomposition process
:
A D C B
A C EF H
EF J EF L
FL G HJ K
F:
: R A B C D E F G H J K L:
Compute ( ; ) :
( ; ) is not a key ( ), ( ), ( ) are all
BCNF violators
Decompose: ( ; ) and [ ( ; )] { }
is a key of ( ; ) join is lossless
A A AD ADC ADCB R
A A C A D A B
A R A A
A A
F
F
F F
F
Example: BCNF decomposition process
:
A D C B
A C EF H
EF J EF L
FL G HJ K
F:
: R A B C D E F G H J K L:
1
2
:
:
R A B C D
R A E F G H J K L
:
no longer BCNF violators A is a superkey of first relation FDs no longer apply to the second
But, still have a violator here
:
A D C B
A C EF H
EF J EF L
FL G HJ K
F:
: R A B C D E F G H J K L:
1
11 12
:
: :
R A B C D
R B C R A C D
: 2: R A E F G H J K L
2( ; )EF EFHJLGK R F
BCNF violators
:
A D C B
A C EF H
EF J EF L
FL G HJ K
F:
: R A B C D E F G H J K L:
1
11 12
:
: :
R A B C D
R B C R A C D
:
2
21 22
:
: :
R A E F G H J K L
R A E F R E F G H J K L
22( ; )FL FLG R FBCNF violator
221
222
:
:
R F G L
R E F H J K L
:
A D C B
A C EF H
EF J EF L
FL G HJ K
F:
: R A B C D E F G H J K L:
1
11 12
:
: :
R A B C D
R B C R A C D
:
2
21 22
:
: :
R A E F G H J K L
R A E F R E F G H J K L
222( ; )HJ HJK R F
221
222
:
:
R F G L
R E F H J K L
2221
2222
:
:
R H J K
R E F H J L
BCNF violator
Aquarium database --- minimal cover
• sno → sname• sno → sfood
• tno → tname• tno → tvolume
• fno → fname• fno → fcolor• fno → fweight
• eno → edate• eno → enote
• tvolume → tcolor
• eno → fno• fno → tno• fno → sno
Only superkeys: supersets of eno ==> all FDs are BCNF violators except thosewith eno on left side
BCNF:
sno sname sfood tno tname tvolume
tvolume tcolor
fno fname fcolor fweight sno tno
eno edate enote fno
BCNF lossless decomposition algorithm:
/* start with universal relation; decompose BCNF violating components */
while ( ) applies to a component and ( ; )
decompose into ( ; ) and [ (i i
i
X Y R X R
R X R X
F F
F ; )] { }XF
Note: it suffices to look for BCNF violators in minimal cover of original application-driven FDs
In applying Armstrong's axioms to generate FD closure, only augmentation and pseudotransitivitygenerate new left hand sides; none generate any new right-side attributes
Augmentation: If ( ) and , then ( )
But, if ( ; ) , then because ( ; ) ( ; ), we have ( ; )
That is, if is a BCNF violator, so is
Pseudotransitivity: If ( )
i
X Y Z R XZ YZ
XZ R X XZ X R
XZ YZ X Y
X Y
F F
F F F F
F and ( ) , then ( )
But, if ( ; ) , then because ( ; ) ( ; ), we have ( ; )
That is, if is a BCNF violator, so is i
WY Z WX Z
WX R X WX X R
WX Z X Y
F F
F F F F
BCNF decomposition is lossless, but not always dependency-preserving
1 2
1
Let be a decomposition with respect to
Define ( ) { | ( ) and ( ) applies to }
Decomposition is dependency-preserving if ( )
n
i i
n
ii
R R R R
X Y X Y X Y R
F
F F
F F
Example:
1 2
Let ( , , ) ( , )
( ; ) ( ) is not a BCNF violator
( ; ) ( ) is BCNF violator decompose into ( , ) and ( , )
No further BCNF violators have BCNF d
R A B C AB C C A
AB ABC R AB C
C CA R C A R A C R B C
F
F
F
ecomposition
Example (continued):
+ 1
1
+ 2
Only possible non-trivial entries from in ( ) are ( ) or ( )
But, ( ; ) { }, so ( ; ) ( )
So, ( ) { } plus trivial FDs
Only possible non-trivial entries from in (
A C C A
A A C A A C
C A
F F
F F F
F
F
2
1 2
) are ( ) or ( )
But, ( ; ) { }, and ( ; ) { } ( ) and ( )
So, ( ) contains only trivial FDs
Consequently, , but ( ) ( ) ( )
So, decomposition is not d
B C C B
B B C C B C C B
AB C AB C
F
F F F F
F
F F F F
ependency-preserving
In general:
• if decomposition does NOT split any FD from the minimal cover, then decomposition IS dependency-preserving
• if decomposition does split an FD from the minimal cover, thenit may or may not be dependency-preserving(text presents an algorithm for deciding)
Second Normal Form:
1 2 Let be a decomposition
Let be a set of application-driven constraints (or minimal cover)
is prime if is contained in a key for
satisfies Second Normal Form (2NF) if
n
i i
i
R R R R
A R A R
R
F
(1) satisfies 1NF, and
(2) non-prime and a key of (i.e., ( ; ))
( ) for every ,
A 2NF violator has components: non-prime , key ,
i
i i
R
A X R R X
Y A Y X Y X
A X Y X
F
F
such that and ( )
The decomposition satisfies 2NF if each component satisfies 2NF
Y X Y A
F
Note: BCNF implies 2NF
Suppose have 2NF violator in BCNF decomposition
That is, have non-prime , key , such that and ( )
By BCNF criterion, is a superkey key is not a key,
A X Y X Y X Y A
Y Z Y X X
F
a contradiction.
Example:
sno sname sfood tno tname tcolor tvolume
• only key is (sno, tno), but (sno → sname) and sname is non-prime
• have several 2NF violators
• signals mixing of application entities in a single table
• decompose into:
sno sname sfood tno tname tcolor tvolume
Note: (1) tables with single-attribute keys cannot violate 2NF (2) tvolume --> tcolor is not a 2NF violator because tvolume is not part of a key
• sno → sname• sno → sfood
• tno → tname• tno → tvolume
• tvolume → tcolor
• eno → fno• fno → tno• fno → sno
• fno → fname• fno → fcolor• fno → fweight
• eno → edate• eno → enote
Storage anomalies (irregularities):
• classify negative consequences of 2NF violators
• insertion anomaly− relation with a 2NF violator mixes two or more application entities− can't insert an instance of just one of them− e.g., in previous example, want to insert a dolphin species that is
as yet unassociated with any tank ...
sno sname sfood tno tname tcolor tvolume
12 shark everything 25 lagoon blue 5000 12 shark everything 27 deep dive green 50000 ...... ......
17 dolphin herring ---- ------ ----- ------
insertion forces nulls into inappropriate attributes, such as part of the key (sno, tno)
Storage anomalies (irregularities):
• deletion anomaly− relation with a 2NF violator mixes two or more application entities− deleting last tank associated with shark species removes all information
about the species
• update anomaly− change sfood attribute of a shark => update several tuples− invites inconsistency
sno sname sfood tno tname tcolor tvolume
12 shark everything 25 lagoon blue 5000 12 shark everything 27 deep dive green 50000 ...... ......
• 2NF decomposition removes most of anomalies:
sno sname sfood
12 shark everything17 dolphin herring
tno tname tcolor tvolume
25 lagoon blue 500027 deep dive green 50000
Storage anomalies (irregularities):
• there remains the negative effect of tvolume --> tcolor− if change all 5000 volume tanks to purple, must update several tuples− still invites inconsistency
• in general, anomalies arise when the contents of some cells can predict thecontent of others
sno sname sfood
12 shark everything17 dolphin herring
tno tname tcolor tvolume
25 lagoon blue 500027 deep dive green 5000084 puddle blue 5000
• sno → sname• sno → sfood
• tno → tname• tno → tvolume
• tvolume → tcolor
• eno → fno• fno → tno• fno → sno
• fno → fname• fno → fcolor• fno → fweight
• eno → edate• eno → enote
predictable entry
Third Normal Form:
1 2 Let be a decomposition
Let be a set of application-driven constraints (or minimal cover)
is prime if is contained in a key for
satisfies Third Normal Form (3NF) if
n
i i
i
R R R R
A R A R
R
F
(1) satisfies 1NF, and
(2) non-trivial ( ) implies either (1) is a superkey for
or (2) is prime
A 3NF violato
i
i
R
X A X R
A
F
r has components:
non-prime , non-trivial ( ) ,
( ; ), that is, is not a superkey for
The decomposition satisfies 3NF if each component satisfies 3NF
i i
A X A
R X X R
F
F
• BCNF => 3NF => 2NF
• 3NF decomposition forces further decomposition in previous example:
sno sname sfood
12 shark everything17 dolphin herring
tno tname tcolor tvolume
25 lagoon blue 500027 deep dive green 5000084 puddle blue 5000
• sno → sname• sno → sfood
• tno → tname• tno → tvolume
• tvolume → tcolor
• eno → fno• fno → tno• fno → sno
• fno → fname• fno → fcolor• fno → fweight
• eno → edate• eno → enote
tno tname tvolume
25 lagoon 500027 deep dive 5000084 puddle 5000
tcolor tvolume
blue 5000 green 50000
• In practice, 3NF is usually BCNF
• only exception occurs when a non-superkey determines a prime attribute
• recall earlier example:
1
Let ( , , ) ( , )
has no 3NF violators [ escapes because is prime]
( ; ) ( ) is not a BCNF violator
( ; ) ( ) is BCNF violator decompose into ( ,
R A B C AB C C A
R C A A
AB ABC R AB C
C CA R C A R A C
F
F
F 2) and ( , )
No further BCNF violators have BCNF decomposition
Conclusion: original is 3NF but not BCNF
R B C
R
• There always exists a lossless, dependency-preserving decomposition into 3NF
• Recall that the BCNF decomposition may not be dependency-preserving
1 2
3NF algorithm:
decomposition 3NFdecomp (attributeSet , FDSet ) {
partition into subsets , , ,
by grouping FDs that have the same left side;
for 1 , let be the at
m
i
R
m
i m R
F
F F F F
0
0 1 2
tributes mentioned in ;
let be an arbitrarily chosen key for ;
return ( , , , , );
i
m
R R
R R R R
F
• For proof of algorithm's validity, see text page 788.
• Cannot always achieve dependency-preserving BCNF; may have to settlefor 3NF and some redundancy
• excessive decomposition is possible, e.g., decompose species into two tables: (sno, sname) and (sno, sfood)
− general idea is that each table represent one distinct application entity plus additional decomposition necessary to accommodate non-relationship constraints, such as tvolume --> tcolor
• there remain redundancies that persist through BCNF, e.g., fish weight in a given tank must sum to 1000 pounds
− can predict weight of last fish in the tank ==> redundancy− constraint is not an FD ==> redundancy is not removed by decomposition
• FD analysis misses multivalued dependencies and join dependencies
Some limitations of functional dependency analysis:
e.g.,
1 1 1 2 1 1
2 2 2 1 2 2
A B C
a b c a b ca b c a b c