1 1 Normalization Normalization • Normalization is the process of efficiently organizing data in a database with two goals in mind • First goal: eliminate redundant data – for example, storing the same data in more than one table • Second Goal: ensure data dependencies make sense – for example, only storing related data in a table Benefits of Normalization • Less storage space • Quicker updates • Less data inconsistency • Clearer data relationships • Easier to add data • Flexible Structure Bad database designs results in: redundancy: inefficient storage. anomalies: data inconsistency, difficulties in maintenance 4 Example Name Price Category Manufacturer gizmo $19.99 gadgets GizmoWorks Power gizmo $29.99 gadgets GizmoWorks SingleTouch $149.99 photography Canon MultiTouch $203.99 household Hitachi Relational schema:Product(Name, Price, Category, Manufacturer) Instance: 5 First Normal Form (1NF) • A database schema is in First Normal Form if all tables are flat Name GPA Courses Alice 3.8 Bob 3.7 Carol 3.9 Math DB OS DB OS Math OS Student Name GPA Alice 3.8 Bob 3.7 Carol 3.9 Student Course Math DB OS Student Course Alice Math Carol Math Alice DB Bob DB Alice OS Carol OS Takes Course May need to add keys 6 Functional Dependencies • A form of constraint – hence, part of the schema • Finding them is part of the database design • Also used in normalizing the relations • Warning: this is the most abstract, and “hardest” part of the database design.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
• We then check if Supplier Address is fully functionally
dependent upon the ENTIRE Primary-Key
• If I know just Item, can I find out Supplier Address?
– No. We can have > 1 supplier for the same product.
• If I know just Supplier, and I find out Supplier Address?
– Yes. The supplier’s address does not depend on the
Item.
• So, Supplier Address is NOT fully functionally
dependent upon the ENTIRE Primary-Key NOT 2NF
So putting things together
Inventory
Description Supplier Cost Supplier Address
Inventory
Description Supplier Cost
Supplier
Name Supplier Address
The above relation is now in 2NF since the relation has no non-key
attributes.
Transitive Dependence
Give a relation R,
Assume the following FD hold:
Note : Both Ename and Address attributes are non-key attributes in R, and
since
Address depends on a non-Prime attribute Name, which depends on the
primary
key(EmpNo), a transitive dependency exists
EmpNo EName Salary Address
AddressEmpNoAddresstEnameEnameEmpNo ,,
AddressEname
EmpNo EName Salary Ename Address
R1 R2
8
43
• Boyce-Codd Normal Form (BCNF)
– A relation is in Boyce-Codd normal form
(BCNF) if every determinant in the table is a
candidate key.
(A determinant is any attribute whose value
determines other values with a row.)
– If a table contains only one candidate key, the
3NF and the BCNF are equivalent.
– BCNF is a special case of 3NF.
Database Normalization
A Table That Is In 3NF But Not In BCNF
Figure 5.7
The Decomposition of a Table Structure to Meet
BCNF Requirements
Figure 5.8 46
Sample Data for a BCNF Conversion
47
Decomposition into BCNF
48
• Based on FDs that take into account all candidate
keys of a relation
• For a relation with only 1 CK, 3NF & BCNF are
equivalent
• A relation is said to be in BCNF if every
determinant is a CK
• Is PLOTS in BCNF?
• NO
BCNF
9
49
• Consider the relation R(A,B,C) with functional dependencies AB C and
C B.
• Is R in 2NF?
• Is R in 3NF?
• Is R in BCNF?
Problem 1
50
Closure of a set of FDs
• Given a set of FDs F on a relation R, it may be possible that several other FDs must also hold for R
• For Example, R=(A,B,C) & FDs, AB & BC hold in R, then FD AC also holds on R
• For a given value of A, there can be only one corresponding value of B, & for that value of B, there can be only one corresponding value for C
• The closure of F is the set of all FDs that can be inferred from F, & is denoted by F+
51
Closure of a set of FDs
• It is not suff. to consider just the given set of FDs
• We need to consider all FDs that hold
• Given F, more FDs can be inferred
• Such FDs are said to be logically implied by F
• F+ is the set of all FDs logically implied by F
• We can compute F+using formal defn. of FD
• If F were large, this process would be lengthy & cumbersome
• Axioms or Rules of Inference provide simpler technique
• Armstrong;s Axioms
52
Inference Rules for FDs
Armstrong's inference rules:IR1. (Reflexive) If Y X, then X Y
IR2. (Augmentation) If X Y, then XZ YZ
(Notation: XZ stands for X U Z)
IR3. (Transitive) If X Y and Y Z, then X Z
IR1, IR2, IR3 form a sound & complete set of
inference rules
Never generates any wrong FD
Generate all FDs that hold
53
Some additional inference rules that are
useful:
Decomposition: If XYZ, then XY & XZ
Union: If XY & XZ, then XYZ
Psuedotransitivity: If XY & WYZ,then WXZ
• The last three inference rules, as well as any other
inference rules, can be deduced from IR1, IR2, and IR3
(completeness property)
Inference Rules for FDs
54
Example
• R = (A, B, C, G, H, I)
F = { A B
A C
CG H
CG I
B H}
• some members of F+
– A H
• by transitivity from A B and B H
– AG I
• by augmenting A C with G, to get AG CG
and then transitivity with CG I
– CG HI
• By union rule
10
55
Closure of Attribute Sets
• Given a set of attributes define the closure of under F
(denoted by +) as the set of attributes that are functionally
determined by under F
• Algorithm to compute +, the closure of under F
result := ;
while (changes to result) dofor each in F do
beginif result then result := result
end
56
Example of Attribute Set Closure
• R = (A, B, C, G, H, I)
• F = {A B, A C, CG H, CG I, B H}
• (AG)+
1. result = AG
2. result = ABCG (A C and A B)
3. result = ABCGH (CG H and CG AGBC)
4. result = ABCGHI (CG I and CG AGBCH)
• Is AG a candidate key? 1. Is AG a super key?
1. Does AG R? == Is (AG)+ R
2. Is any subset of AG a superkey?
1. Does A R? == Is (A)+ R
2. Does G R? == Is (G)+ R
57
Uses of Attribute ClosureThere are several uses of the attribute closure algorithm:
• Testing for superkey:
– To test if is a superkey, we compute +, and check if +
contains all attributes of R.
• Testing functional dependencies
– To check if a functional dependency holds (or, in other
words, is in F+), just check if +.
– That is, we compute + by using attribute closure, and then
check if it contains .
– Is a simple and cheap test, and very useful
• Computing closure of F
– For each R, we find the closure +, and for each S +,
we output a functional dependency S.
58
Canonical Cover• Sets of functional dependencies may have redundant
dependencies that can be inferred from the others– For example: A C is redundant in: {A B, B
C}
– Parts of a functional dependency may be redundant• E.g.: on RHS: {A B, B C, A CD} can be
simplified to {A B, B C, A D}
• E.g.: on LHS: {A B, B C, AC D} can be simplified to
{A B, B C, A D}
• Intuitively, a canonical cover of F is a “minimal” set of functional dependencies equivalent to F, having no redundant dependencies or redundant parts of dependencies
59
Equivalence of Sets of FDs
• Two sets of FDs F and G are equivalent if:
- every FD in F can be inferred from G, &
- every FD in G can be inferred from F
• Hence, F and G are equivalent if F+=G+
Definition: F covers G if every FD in G can be inferred from F (i.e., if G+ F+)
• F and G are equivalent if F covers G and G covers F
• There is an algorithm for checking equivalence of sets of FDs
60
Extraneous Attributes
• Consider a set F of functional dependencies and the functional dependency in F.
– Attribute A is extraneous in if A and F logically implies (F – { }) {( – A) }.
– Attribute A is extraneous in if Aand the set of functional dependencies (F – { }) { ( – A)} logically implies F.
• Note: implication in the opposite direction is trivial in each of the cases above, since a “stronger” functional dependency always implies a weaker one
• Example: Given F = {A C, AB C }
– B is extraneous in AB C because {A C, AB C} logically implies A C (I.e. the result of dropping B from AB C).
• Example: Given F = {A C, AB CD}
– C is extraneous in AB CD since AB C can be inferred even after deleting C
11
61
Testing if an Attribute is Extraneous
• Consider a set F of functional dependencies and the functional dependency in F.
• To test if attribute A is extraneous in
1. compute ({ } – A)+ using the dependencies in F
2. check that ({ } – A)+ contains ; if it does, A is
extraneous
• To test if attribute A is extraneous in
1. compute + using only the dependencies in
F’ = (F – { }) { ( – A)},
2. check that + contains A; if it does, A is extraneous
62
Canonical Cover
• A canonical cover for F is a set of dependencies Fc such that
– F logically implies all dependencies in Fc, and
– Fc logically implies all dependencies in F, and
– No functional dependency in Fc contains an extraneous attribute, and
– Each left side of functional dependency in Fc is unique.
• To compute a canonical cover for F:repeat
Use the union rule to replace any dependencies in F1 1 and 1 2 with 1 1 2
Find a functional dependency with an extraneous attribute either in or in
If an extraneous attribute is found, delete it from until F does not change
• Note: Union rule may become applicable after some extraneous attributes have been deleted, so it has to be re-applied
63
Computing Canonical Cover
• R = (A, B, C)F = {A BC, B C, A B, AB C}
• Combine A BC and A B into A BC
– Set is now {A BC, B C, AB C}
• A is extraneous in AB C
– Check if the result of deleting A from AB C is implied by the other dependencies• Yes: in fact, B C is already present!
– Set is now {A BC, B C}
• C is extraneous in A BC
– Check if A C is logically implied by A B and the other dependencies• Yes: using transitivity on A B and B C.
– Can use attribute closure of A in more complex cases
Want to be able to reconstruct big (e.g. universal) relation by
joining smaller ones (using natural joins)
(i.e. R1 R2 = R)
2. Dependency preservation
Want to minimize the cost of global integrity constraints based on FD’s
( i.e. avoid big joins in assertions)
3. Redundancy Avoidance
Avoid unnecessary data duplication (the motivation for decomposition)
Why important?
LJ : information loss
DP: efficiency (time)
RA: efficiency (space), update anomalies
Lossy Decomposition
A B C
1 2 3
4 5 6
7 2 8
1 2 8
7 2 3
A B C
1 2 3
4 5 6
7 2 8
A B
1 2
4 5
7 2
B C
2 3
5 6
2 8
JOINSpurious Tuples
12
67
Dependency Goal #1: lossless joinsA bad decomposition:
bname bcity assets cname
Downtown Bkln 9M Jones
Downtown Bkln 9M Johnson
Mianus Horse 1.7M Jones
Downtown Bkln 9M Hayes
cname lno amt
Jones L-17 1000
Johnson L-23 2000
Jones L-93 500
Hayes L-17 1000
=
bname bcity assets cname lno amt
Downtown Bkln 9M Jones L-17 1000
Downtown Bkln 9M Jones L-93 500
Downtown Bkln 9M Johnson L-23 2000
Mianus Horse 1.7M Jones L-17 1000
Mianus Horse 1.7M Jones L-93 500
Downtown Bkln 9M Hayes L-17 1000
Problem: join adds meaningless tuples
“lossy join”: by adding noise, have lost meaningful information as a
result of the decomposition
68
Dependency Goal #1: lossless joinsIs the following decomposition lossless or lossy?
bname assets cname lno
Downtown 9M Jones L-17
Downtown 9M Johnson L-23
Mianus 1.7M Jones L-93
Downtown 9M Hayes L-17
lno bcity amt
L-17 Bkln 1000
L-23 Bkln 2000
L-93 Horse 500
Ans: Lossless: R = R1 R2, it has 4 tuples
69
Ensuring Lossless Joins
A decomposition of R : R = R1 U R2
Is lossless iff
R1 R2 R1, or
R1 R2 R2
(i.e., intersecting attributes must for a superkey for
one of the resulting smaller relations)
Lossless Decomposition
Theorem
A decomposition of R into R1 and R2 is lossless join wrt FDs F, if and only if at least one of the following dependencies is in F+:
• R1 R2 R1• R1 R2 R2
In other words, R1 R2 forms a superkey of
either R1 or R2
Lossy Decomposition
S# Status
S3 30
S5 30
S# City
S3 Paris
S5 Athens
S# Status
S3 30
S5 30
Status City
30 Paris
30 Athens
S# Status City
S3 30 Paris
S5 30 Athens
Lossless Decomposition
• Observe that S satisfies the FDs:
– S# Status & S# City
• It can not be a coincidence that S is equal to the
join of its projections on {S#, Status} & {S#, City}
• Heaths’ Theorem:
Let R{A,B,C} be a relation, where A, B, & C are
sets of attributes. If R satisfies AB & AC,
then R is equal to the join of its projections on
{A,B} & {A,C}
• Observe that in the second decomposition of S
the FD, S# City is lost
13
Lossless Decomposition
• The decomposition of R into R1, R2, …Rn is lossless if for
any instance r of R
r = R1 (r ) R2 (r ) …… Rn (r )
• We can replace R by R1 & R2, knowing that the instance of
R can be recovered from the instances of R1 & R2
• We can use FDs to show that decompositions are lossless
74
Decomposition Goal #2: Dependency
preservationGoal: efficient integrity checks of FD’s
An example w/ no DP:
R = ( bname, bcity, assets, cname, lno, amt)
bname bcity assets
lno amt bname
Decomposition: R = R1 U R2
R1 = (bname, assets, cname, lno)
R2 = (lno, bcity, amt)
Lossless but not DP. Why?
Ans: bname bcity assets crosses 2 tables
75
Decomposition Goal #2: Dependency
preservationTo ensure best possible efficiency of FD checks
ensure that only a SINGLE table is needed in order to check each FD
i.e. ensure that: A1 A2 ... An B1 B2 ... Bm
Can be checked by examining Ri = ( ..., A1, A2, ..., An, ..., B1, ..., Bm, ...)
To test if the decomposition R = R1 U R2 U ... U Rn is DP
(1) see which FD’s of R are covered by R1, R2, ..., Rn
(2) compare the closure of (1) with the closure of FD’s of R
76
Decomposition Goal #2: Dependency
preservation
Example: Given F = { AB, AB D, C D}
consider R = R1 U R2 s.t.
R1 = (A, B, D) , R2 = (C, D)
(1) F+ = { ABD, CD}+
(2) G = {ABD, CD, ...} +
(3) F+ = G+
note: G+ cannot introduce new FDs not in F+
Decomposition is DP
77
Dependency Preservation
• Let Fi be the set of dependencies F + that include only attributes in Ri.
• A decomposition is dependency preserving, if
(F1 F2 … Fn )+ = F +
• If it is not, then checking updates for violation of functional dependencies may require computing joins, which is expensive.
78
Testing for Dependency Preservation
• To check if a dependency is preserved in a
decomposition of R into R1, R2, …, Rn we apply the following
test (with attribute closure done with respect to F)
– result =
while (changes to result) do
for each Ri in the decompositiont = (result Ri)
+ Ri
result = result t
– If result contains all attributes in , then the functional dependency
is preserved.
• We apply the test on all dependencies in F to check if a
decomposition is dependency preserving
• This procedure takes polynomial time, instead of the
exponential time required to compute F+ and (F1 F2 …
Fn)+
14
Example• R = (A, B, C)
F = {A B, B C)
– Can be decomposed in two different ways
• R1 = (A, B), R2 = (B, C)
– Lossless-join decomposition:
R1 R2 = {B} and B BC
– Dependency preserving
• R1 = (A, B), R2 = (A, C)
– Lossless-join decomposition:
R1 R2 = {A} and A AB
– Not dependency preserving (cannot check B C without computing R1 R2)
80
Decomposition Goal #3: Redudancy
Avoidance
Redundancy
for B=x , y and z
Example: A B C
a x 1
e x 1
g y 2
h y 2
m y 2
n z 1
p z 1
(1) An FD that exists in the above relation is: B C
(2) A superkey in the above relation is A, (or any set containing A)
When do you have redundancy?
Ans: when there is some FD, XY covered by a relation
and X is not a superkey
Problems with Decompositions
There are three potential problems to consider:
– Some queries become more expensive
• e.g., What is the price of prop# 1?
– Given instances of the decomposed relations, we
may not be able to reconstruct the corresponding
instance of the original relation!
• Fortunately, not in the PLOTS example
– Checking some dependencies may require joining the
instances of the decomposed relations.
• Fortunately, not in the PLOTS example
Tradeoff: Must consider these issues vs. redundancy
Example• R = (A, B, C )
F = {A B
B C}
Key = {A}
• R is not in BCNF (B C but B is not
superkey)
• Decomposition R1 = (A, B), R2 = (B, C)
– R1 and R2 in BCNF
– Lossless-join decomposition
– Dependency preserving
Testing for BCNF• To check if a non-trivial dependency causes a violation of BCNF
1. compute + (the attribute closure of ), and
2. verify that it includes all attributes of R, that is, it is a superkey of R.
• Simplified test: To check if a relation schema R is in BCNF, it suffices to check only the dependencies in the given set F for violation of BCNF, rather than checking all dependencies in F+.
– If none of the dependencies in F causes a violation of BCNF, then none of the dependencies in F+ will cause a violation of BCNF either.
• However, simplified test using only F is incorrect when testing a relation in a decomposition of R
– Consider R = (A, B, C, D, E), with F = { A B, BC D}
• Decompose R into R1 = (A,B) and R2 = (A,C,D, E)
• Neither of the dependencies in F contain only attributes from(A,C,D,E) so we might be mislead into thinking R2 satisfies BCNF.
• In fact, dependency AC D in F+ shows R2 is not in BCNF.
BCNF and Dependency Preservation
• R = (J, K, L )F = {JK L
L K }
Two candidate keys = JK and JL
• R is not in BCNF
• Any decomposition of R will fail to preserve
JK L
This implies that testing for JK L requires a
join
It is not always possible to get a BCNF decomposition that is
dependency preserving
15
Third Normal Form: Motivation
• There are some situations where
– BCNF is not dependency preserving, and
– efficient checking for FD violation on updates is
important
• Solution: define a weaker normal form, called Third
Normal Form (3NF)
– Allows some redundancy (with resultant problems; we
will see examples later)
– But functional dependencies can be checked on
individual relations without computing a join.
– There is always a lossless-join, dependency-
preserving decomposition into 3NF.
Redundancy in 3NF
J
j1
j2
j3
null
L
l1
l1
l1
l2
K
k1
k1
k1
k2
repetition of information (e.g., the relationship l1, k1)
(i_ID, dept_name)
need to use null values (e.g., to represent the relationship
l2, k2 where there is no corresponding value for J).
(i_ID, dept_nameI) if there is no separate relation mapping
instructors to departments
• There is some redundancy in this schema
• Example of problems due to redundancy in 3NF
– R = (J, K, L)F = {JK L, L K }
Testing for 3NF
• Optimization: Need to check only FDs in F, need not check all FDs
in F+.
• Use attribute closure to check for each dependency , if is a
superkey.
• If is not a superkey, we have to verify if each attribute in is
contained in a candidate key of R
– this test is rather more expensive, since it involve finding
candidate keys
– testing for 3NF has been shown to be NP-hard
– Interestingly, decomposition into third normal form (described
shortly) can be done in polynomial time
3NF Decomposition AlgorithmLet Fc be a canonical cover for F;i := 0;for each functional dependency in Fc doif none of the schemas Rj, 1 j i contains
then begini := i + 1;Ri :=
endif none of the schemas Rj, 1 j i contains a candidate key for Rthen begin
i := i + 1;Ri := any candidate key for R;
end /* Optionally, remove redundant relations */
repeatif any schema Rj is contained in another schema Rk
then /* delete Rj */Rj = R;;i=i-1;
return (R1, R2, ..., Ri)
Testing Decomposition for BCNF
• To check if a relation Ri in a decomposition of R is in BCNF,
– Either test Ri for BCNF with respect to the restriction of F
to Ri (that is, all FDs in F+ that contain only attributes from
Ri)
– or use the original set of dependencies F that hold on R, but
with the following test:
– for every set of attributes Ri, check that + (the
attribute closure of ) either includes no attribute of
Ri- , or includes all attributes of Ri.
• If the condition is violated by some in F, the
dependency( + - ) Ri
can be shown to hold on Ri, and Ri violates BCNF.
• We use above dependency to decompose Ri
BCNF Decomposition Algorithmresult := {R };
done := false;
compute F +;
while (not done) do
if (there is a schema Ri in result that is not in BCNF)
then begin
let be a nontrivial functional dependency that holds on Ri such that Ri is not in F +,
and = ;
result := (result – Ri ) (Ri – ) ( , );
end
else done := true;
Note: each Ri is in BCNF, and decomposition is lossless-join.
• In relational databases, repeating groups are not
allowed
Course Teacher Texts
DBS N Goyal
J P Misra
Yash
Garcia
Korth
Elmasiri
Raghu
Networks S Mohan
Rahul
J P Misra
Tannenbaum
Keshav
Petterson
95
4 NF• 1 NF Version
COURSE TEACHER TEXTS
DBS N GOYAL GARCIA
DBS N GOYAL KORTH
DBS N GOYAL ELMASIRI
DBS N GOYAL RAGHU R
DBS J P MISRA GARCIA
DBS J P MISRA KORTH
DBS J P MISRA ELMASIRI
DBS J P MISRA RAGHU R
NETWORKS S MOHAN TANNENBAUM
NETWORKS S MOHAN KESHAV
NETWORKS S MOHAN KUROSE
NETWORKS RAHUL TANNENBAUM
NETWORKS RAHUL KESHAV
NETWORKS RAHUL KUROSE
CTX
96
4 NF• ANY REDUNDANCY? ANY ANOMALIES?
COURSE TEACHER TEXTS
DBS N GOYAL GARCIA
DBS N GOYAL KORTH
DBS N GOYAL ELMASIRI
DBS N GOYAL RAGHU R
DBS J P MISRA GARCIA
DBS J P MISRA KORTH
DBS J P MISRA ELMASIRI
DBS J P MISRA RAGHU R
NETWORKS S MOHAN TANNENBAUM
NETWORKS S MOHAN KESHAV
NETWORKS S MOHAN PETTERSON
NETWORKS RAHUL TANNENBAUM
NETWORKS RAHUL KESHAV
NETWORKS RAHUL PETTERSON
CTX
17
97
4 NF• Redundancy is due to the constraint that the texts
for a course are independent of the instructors
• This constraint cannot be expressed in terms of
FDs
• Example of MVD
• Is CTX in BCNF?
• New Teacher for DBS
• New Text for Networks
• Teacher teaching DBS leaves
98
4 NF
• Decompose CTX into CT & TX
COURSE TEACHER
DBS N GOYAL
DBS J P MISRA
DBS S JAGADISH
NETWORKS S MOHAN
NETWORKS RAHUL
NETWORKS J P MISRA
COURSE TEXT
DBS GARCIA
DBS KORTH
DBS ELMASIRI
DBS RAGHU R
NETWORKS TANNENBAUM
NETWORKS KESHAV
NETWORKS PETTERSON
CTTX
99
4 NF
• Decompose CTX into CT & TX is not done on the
basis of FDs
• Decompose CTX into CT & TX is done on the basis
of MVDs
• MVDs
Represents a dependency between attributes of a relation,
such that for every value of A, there is a set of values of B &
a set of values of C, The set of values for B & C are
independent of each other
course teacher
course text
100
Multi-Valued Dependencies
• A multi-valued dependency occurs when a
determinant determines more than one
dependent, and the dependents are
independent of each other
• Example course implies teacher; course implies
text, where teacher and text are independent
• A relation with course, instructor and text is all
key, and exhibits redundancy, but is in 3NF
• Updates can exhibit anomalies
101
4 NF
• An MVDs A B is trivial if
(a) B A or
(b) A U B = R
• A relation that is in BCNF & contains no non-trivial
MVDs is said to be in 4NF
• CTX is not in 4NF because course teacher is a
non trivial MVD
102
Fourth Normal Form
• Relation R is in 4 NF if and only if, whenever there exist subsets A and B of the attributes of R such that the nontrivial multi-valued dependency A multi-determines B is satisfied, then all attributes of R are also functionally dependent on A
• In the previous example, decompose course,instructor, text into two relation: course, instructor, and course text
18
103
Multi-Valued Dependencies
• An MVD is an assertion that 2 attributes or sets of attributes are independent of each other
• Generalization of the concept of FD in the sense that every FD implies a corresponding MVD
• Independence of attribute sets cannot be explained using FDs
• SO what causes MVDs?
• Role of MVDs in database schema design
104
Multi-Valued Dependencies
• Most common source of redundancy in BCNF schemas is to put 2 or more M:M relationships in a single relation
• Note that in CTX, there are no non-trivial FDs
• If you fix the values for one set of attributes, then the values in certain other attributes are independent of all the other attributes in the relation
Multivalued Dependencies (MVDs)
• Let R be a relation schema and let R and R.
The multivalued dependency
holds on R if in any legal relation r(R), for all pairs for tuples t1 and t2 in r such that t1[ ] = t2 [ ], there exist
tuples t3 and t4 in r such that:
t1[ ] = t2 [ ] = t3 [ ] = t4 [ ]
t3[ ] = t1 [ ]
t3[R – ] = t2[R – ]
t4 [ ] = t2[ ]
t4[R – ] = t1[R – ]
MVD (Cont.)• Tabular representation of
107
Formal Definition of MVD
• The MVD
A1A2….An B1B2…Bm
holds for a relation R if
for each pair of tuples t & u that agree on As, we can find a tuple v that agrees
1. With t & u on As
2. With t on Bs
3. With u on all attributes of R that are not among As & Bs
108
MVD
t
v
A’s B’sA B
Others
u
19
109
• 4NF
• 5NF
• 6NF
• DKNF
110
• Fourth Normal Form(4NF)– Eliminates data redundancy caused by Multi-valued
dependencies. (MVD)
– A given relations in 4NF may not contain more than one
multi-valued dependency.
111
• MVD?
Multi-value Dependencies (XY) hold
in a relation R if when ever we have two
tuples of R that agree on all the attributes
of X, then we can swap their Y
components and get two tuples that are
also in R.
112
• Example
• In Relation R(A,B,C) how can we find if
AB
• If the relation has two tuples
A
1
1
B
7
3
C
4
2
Then that table should also contain
two other tuples where B’s are
swapped.
Do this for all tuples that have the
same A values.1
1
3
7
4
2
113
• What is so bad about having a table with
multiple multi-valued dependencies?
• Example: Consider R(Departments, Jobs , Resources Used)
The table has the following MVDs department Parts
department Jobs
114
• Department d1 works on jobs j1, and j2 with parts p1 and p2• Department d2 works on jobs j3, j4, and j5 with parts p2 and p4• Department d3 works on job j2 only with parts p5 and p6.
Department Job Part#-------------------------------------------------
• If you want to add a part to a department, you must create more than one new row.
• Likewise, to remove a part or a job from a row can destroy information.
• Updating a part or job name will also require multiple rows to be
changed.
• The solution is to split this table into two tables, one with
(department, projects) in it and one with (department, parts) in it.
**Only desirable MVD is the ones whose determinant is a super key of R.
Special Case: Assume R has the following two-multi value dependencies:
A B and B C
In this case R will be in the fourth normal form iff B and C are dependent on each other. 116
A relation R is in 5NF if for all join dependencies at least
one of the following holds.
(a) (R1, R2, ..., Rn) is a trivial join-dependency.
(b) Every Ri is a candidate key for R.
117
• A table is said to be in the 5NF iff it is in
4NF and every join dependency in it is
implied by the candidate keys.• Sometimes its impossible to break the table into 2
tables, that is when you can use the rules of 5NF
to normalize.
• Generally a table in 4th NF is always in 5th NF, but
sometimes real world constraint will cause the
Relation to be not comply with 5th NF.
118
• Join Dependencies: They are basically
generalization of MVD.
• A condition where the natural join of all its
projections results in the reconstruction of
R.
• If such a condition is present then that
relation should be replaced with the
tables that consist of its projections.
119
The psychiatrist is able
to offer reimbursable
treatment to patients who
suffer from the given
condition and who are
insured by the given
insurer. Psychiatrist-to-
Insurer-to-Condition is
necessary in order to
model the situation
correctly.
120
• Suppose, however, that the following rule
applies: When a psychiatrist is authorized
to offer reimbursable treatment to
patients insured by Insurer P, and the
psychiatrist is able to treat condition C,
then – in the event that the Insurer P
covers condition C – it must be true that
the psychiatrist is able to provide
treatment to patients who suffer from
condition C and are insured by Insurer P.
21
121
These are all the possible projections of the Previous table. And
if (R1 |X| R2) or (R2 |X| R3) or (R1 |X| R3) result in R then
there are MVD (4th NF), and if NJ of {R1, R2, R3} results in R
then JD exist and the original table is not in 5th NF 122
• Only in rare situations does a 4NF table
not conform to 5NF. These are situations
in which a complex real-world constraint
governing the valid combinations of
attribute values in the 4NF table is not
implicit in the structure of that table.
123
Fifth Normal Form
• A relation R is in 5NF – also called
projection-join normal form, if and only if
every nontrivial join dependency that is
satisfied by R is implied by the candidate
key(s) of R
• It is the most general form possible for
projection-based normalization
124
• DKNF offers a complete solution to the problem of avoiding modification abnormalities
• Domain/key normal form (DKNF). A key uniquely identifies each row in a table.
• By enforcing key and domain restrictions, the database is assured of being freed from any modification inconsistency.
125
• Ronald Fagin (1981) proved that if a Relation is in DKNF then it is free from any anomalies(redundancies). Including the ones caused by FDs, MVDs, JDs.
• DKNF seems simple enough then why all the hoopla about 1NF, 2NF, 3NF, BCNF, 4NF, 5NF
126
DKNF not always achievable, and there is no formal definition to verify if a relation schema is in DKNF
In short, sets of single-theme tables will most likely be in DKNF.