Module 6 Relational Database Design
Jan 03, 2016
Module 6
Relational Database Design
Module 6 2042023
Topics to be covered
Pitfalls in relational database design Functional dependencies Armstrong Axioms Decomposition Desirable properties of decomposition Boyce-code normal form 3rd and 4th normal form Mention of other normal forms
Module 6 3042023
Evaluating relation schemas
Two levels of relation schemasThe logical or conceptual view
How users interpret the relation schemas and the meaning of their attributes
Implementation or storage view How the tuples in the base relation are stored
and updated
Module 6 4042023
Informal Design Guidelines for Relational Databases Four informal measures of quality for
relation schema design are
1 Imparting clear semantics to attributes in Relations
2 Reducing the redundant values in tuples
3 Reducing the null values in tuples
4 Disallowing the possibility of generating spurious tuples
Module 6 5042023
1Semantics of the Relation Attributes GUIDELINE 1 Informally each tuple in a relation
should represent one entity or relationship instance (Applies to individual relations and their attributes) Attributes of different entities (EMPLOYEEs
DEPARTMENTs PROJECTs) should not be mixed in the same relation
Only foreign keys should be used to refer to other entities
Entity and relationship attributes should be kept apart as much as possible
Bottom Line Design a schema that can be explained easily relation by relation The semantics of attributes should be easy to interpret
Module 6 6042023
A Simplified COMPANY relational database schema
Module 6 7042023
Two relation schemas suffering from update anomalies
ENAME SSN BDATEADDRES
SDNUMBE
RDNAME
DMGRSSN
PLOCATION
SSNPNUMBE
RHOURS ENAME PNAME
EMP_PROJ
EMP_DEPT
Module 6 8042023
Two relation schemas suffering from update anomalies Although there is nothing wrong logically with
these 2 relations they are considered poor designs because they violate guideline 1 by mixing attributes from distinct real world entities
EMP_DEPT mixes attributes of employee and department and EMP_PROJ mixes attributes of employees amp projects and the WORKS_ON relationship
They may be used as views but they cause problems when used as base relations
Module 6 9042023
2Redundant Information in Tuples and Update Anomalies Goal of schema design is to minimize the
storage space used by the base relations Information is stored redundantly Wastes storage
Causes problems with update anomalies Insertion anomalies Deletion anomalies Modification anomalies
Module 6 10042023
Two relation schemas suffering from update anomalies
ENAME SSN BDATEADDRES
SDNUMBE
RDNAME
DMGRSSN
PLOCATION
SSNPNUMBE
RHOURS ENAME PNAME
EMP_DEPT
EMP_PROJ
Module 6 11042023
EXAMPLE OF AN INSERT ANOMALY Consider the relation
EMP_PROJ(Emp Proj Ename Pname No_hours)
Insert Anomaly Cannot insert a project unless an employee is
assigned to it Conversely
Cannot insert an employee unless an heshe is assigned to a project
Module 6 12042023
EXAMPLE OF AN DELETE ANOMALY Consider the relation
EMP_PROJ(Emp Proj Ename Pname No_hours)
Delete Anomaly When a project is deleted it will result in deleting
all the employees who work on that project Alternately if an employee is the sole employee
on a project deleting that employee would result in deleting the corresponding project
Module 6 13042023
EXAMPLE OF AN UPDATE ANOMALY Consider the relation
EMP_PROJ(Emp Proj Ename Pname No_hours)
Update AnomalyChanging the name of project number P1
from ldquoBillingrdquo to ldquoCustomer-Accountingrdquo may cause this update to be made for all 100 employees working on project P1
Module 6 14042023
Module 6 15042023
Guideline to Redundant Information in Tuples and Update Anomalies GUIDELINE 2
Design a schema that does not suffer from the insertion deletion and update anomalies
If there are any anomalies present then note them so that applications can be made to take them into account
In general it is advisable to use anomaly free base relations and to specify views that include the joins for placing together the attributes frequently referenced in important queries
Module 6 16042023
Problems with Nulls If many attributes are grouped together
as a fat relation it gives rise to many nulls in the tuples
Waste storage Problems in understanding the
meaning of the attributes Difficult while using Nulls in aggregate
operators like count or sum
Module 6 17042023
3 Null Values in Tuples Interpretations of nulls
Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist but unavailable
GUIDELINE 3 Relations should be designed such that their
tuples will have as few NULL values as possible Attributes that are NULL frequently could be
placed in separate relations (with the primary key) Example-
if only 10 of employees have individual offices it is better not to include office_number as an attribute in the employee relation
Better create a new relation emp_offices(essn office_number)
Module 6 18042023
Example of Spurious Tuples
Module 6 19042023
Generation of spurious tuples The two relations EMP_PROJ1 and EMP_LOCS as
the base relations of EMP_PROJ is not a good schema design
Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ
These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid
This is because the PLOCATION attribute which is used for joining is neither a primary key nor a foreign key in either EMP_LOCS AND EMP_PROJ1
Module 6 20042023
Example of Spurious Tuples contd
Module 6 21042023
4 Spurious Tuples Bad designs for a relational database may result
in erroneous results for certain JOIN operations The lossless join property is used to
guarantee meaningful results for join operations
GUIDELINE 4 Design relation schemas so that they can be
joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated
Module 6 22042023
Spurious Tuples
There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies
Note that Property (a) is extremely important and cannot be
sacrificed Property (b) is less stringent and may be sacrificed
Module 6 23042023
Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done
during Insertion Modification Deletion
Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values
Generation of invalid and spurious data during joins on improperly related base relations
Module 6 24042023
Functional dependencies Functional dependencies (FDs)
Is a constraint between two sets of attributes from the database
Assumption The entire database is a single universal
relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes
Module 6 25042023
Definition
FDs are used to specify formal measures of the
goodness of relational designs keys that are used to define normal forms for
relations constraints that are derived from the meaning and
interrelationships of the data attributes A set of attributes X functionally determines
a set of attributes Y if the value of X determines a unique value for Y
Module 6 26042023
Functional Dependencies
A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If
t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r
depend on or are determined by the values of the X component
The values of the X component functionally determines the values of Y component
FDs are derived from the real-world constraints on the attributes
The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times
Module 6 27042023
Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760
Superior NA 31700 350
Victoria Africa 26828 250
Aral Sea Asia 24904 280
Huron NA 23000 206
Michigan NA 22300 307
Tanganyika Africa 12700 420
Continent -gtName Name -gtLength
Module 6 28042023
Graphical representation of Functional Dependencies
Module 6 29042023
Examples of FD constraints Social security number uniquely determines
employee name SSN -gt ENAME
Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION
Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS
Module 6 30042023
Examples of FD constraints A FD is a property of the attributes in the
schema R not of a particular legal relation state r of R
It must be defined explicitly by someone who knows the semantics of the attributes of R
The constraint must hold on every relation instance r(R)
If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with
t1[K]=t2[K])
Module 6 31042023
Satisfies algorithm
Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B
How it works Sort the tuples of the relation r on the A attributes so
that tuples with equal values under A are next to each other
Check that tuples with equal values under attributes A also have equal values under B
If it meets the condition 2 then the output of the algorithm is true else it is false
Module 6 32042023
Relation state of TEACH
TEACH
TEACHER COURSE TEXT
Teacher Course Text
Smith Data Structures
Bartram
Smith Data Management
Martin
Hall Compilers Hoffmann
Brown ooad Horowitz
TEACHER -gt COURSE
TEXT -gt COURSE
Module 6 33042023
Drawbacks of Satifies algorithm
Using this algorithm is tedious and time consuming
So inference axioms are used
Module 6 34042023
Inference Rules for Functional Dependencies
F is the set of functional dependencies that are specified on relation schema R
Schema designers specifies the most obvious FDs
The other dependencies can be inferred or deduced from FDs in F
Module 6 35042023
Example of Closure Department has one manager (DEPT_NO -gt
MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two
dependencies together imply that (DEPT_NO-gtMGR_PHONE)
This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F
The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)
Module 6 36042023
Example
F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are
SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME
To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y
Module 6 37042023
Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold
whenever the FDs in F hold Armstrongs inference rules
IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ
(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z
IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer
from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]
By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F
Module 6 38042023
Inference Rules for FDs Some additional inference rules that are useful
Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z
The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 2042023
Topics to be covered
Pitfalls in relational database design Functional dependencies Armstrong Axioms Decomposition Desirable properties of decomposition Boyce-code normal form 3rd and 4th normal form Mention of other normal forms
Module 6 3042023
Evaluating relation schemas
Two levels of relation schemasThe logical or conceptual view
How users interpret the relation schemas and the meaning of their attributes
Implementation or storage view How the tuples in the base relation are stored
and updated
Module 6 4042023
Informal Design Guidelines for Relational Databases Four informal measures of quality for
relation schema design are
1 Imparting clear semantics to attributes in Relations
2 Reducing the redundant values in tuples
3 Reducing the null values in tuples
4 Disallowing the possibility of generating spurious tuples
Module 6 5042023
1Semantics of the Relation Attributes GUIDELINE 1 Informally each tuple in a relation
should represent one entity or relationship instance (Applies to individual relations and their attributes) Attributes of different entities (EMPLOYEEs
DEPARTMENTs PROJECTs) should not be mixed in the same relation
Only foreign keys should be used to refer to other entities
Entity and relationship attributes should be kept apart as much as possible
Bottom Line Design a schema that can be explained easily relation by relation The semantics of attributes should be easy to interpret
Module 6 6042023
A Simplified COMPANY relational database schema
Module 6 7042023
Two relation schemas suffering from update anomalies
ENAME SSN BDATEADDRES
SDNUMBE
RDNAME
DMGRSSN
PLOCATION
SSNPNUMBE
RHOURS ENAME PNAME
EMP_PROJ
EMP_DEPT
Module 6 8042023
Two relation schemas suffering from update anomalies Although there is nothing wrong logically with
these 2 relations they are considered poor designs because they violate guideline 1 by mixing attributes from distinct real world entities
EMP_DEPT mixes attributes of employee and department and EMP_PROJ mixes attributes of employees amp projects and the WORKS_ON relationship
They may be used as views but they cause problems when used as base relations
Module 6 9042023
2Redundant Information in Tuples and Update Anomalies Goal of schema design is to minimize the
storage space used by the base relations Information is stored redundantly Wastes storage
Causes problems with update anomalies Insertion anomalies Deletion anomalies Modification anomalies
Module 6 10042023
Two relation schemas suffering from update anomalies
ENAME SSN BDATEADDRES
SDNUMBE
RDNAME
DMGRSSN
PLOCATION
SSNPNUMBE
RHOURS ENAME PNAME
EMP_DEPT
EMP_PROJ
Module 6 11042023
EXAMPLE OF AN INSERT ANOMALY Consider the relation
EMP_PROJ(Emp Proj Ename Pname No_hours)
Insert Anomaly Cannot insert a project unless an employee is
assigned to it Conversely
Cannot insert an employee unless an heshe is assigned to a project
Module 6 12042023
EXAMPLE OF AN DELETE ANOMALY Consider the relation
EMP_PROJ(Emp Proj Ename Pname No_hours)
Delete Anomaly When a project is deleted it will result in deleting
all the employees who work on that project Alternately if an employee is the sole employee
on a project deleting that employee would result in deleting the corresponding project
Module 6 13042023
EXAMPLE OF AN UPDATE ANOMALY Consider the relation
EMP_PROJ(Emp Proj Ename Pname No_hours)
Update AnomalyChanging the name of project number P1
from ldquoBillingrdquo to ldquoCustomer-Accountingrdquo may cause this update to be made for all 100 employees working on project P1
Module 6 14042023
Module 6 15042023
Guideline to Redundant Information in Tuples and Update Anomalies GUIDELINE 2
Design a schema that does not suffer from the insertion deletion and update anomalies
If there are any anomalies present then note them so that applications can be made to take them into account
In general it is advisable to use anomaly free base relations and to specify views that include the joins for placing together the attributes frequently referenced in important queries
Module 6 16042023
Problems with Nulls If many attributes are grouped together
as a fat relation it gives rise to many nulls in the tuples
Waste storage Problems in understanding the
meaning of the attributes Difficult while using Nulls in aggregate
operators like count or sum
Module 6 17042023
3 Null Values in Tuples Interpretations of nulls
Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist but unavailable
GUIDELINE 3 Relations should be designed such that their
tuples will have as few NULL values as possible Attributes that are NULL frequently could be
placed in separate relations (with the primary key) Example-
if only 10 of employees have individual offices it is better not to include office_number as an attribute in the employee relation
Better create a new relation emp_offices(essn office_number)
Module 6 18042023
Example of Spurious Tuples
Module 6 19042023
Generation of spurious tuples The two relations EMP_PROJ1 and EMP_LOCS as
the base relations of EMP_PROJ is not a good schema design
Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ
These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid
This is because the PLOCATION attribute which is used for joining is neither a primary key nor a foreign key in either EMP_LOCS AND EMP_PROJ1
Module 6 20042023
Example of Spurious Tuples contd
Module 6 21042023
4 Spurious Tuples Bad designs for a relational database may result
in erroneous results for certain JOIN operations The lossless join property is used to
guarantee meaningful results for join operations
GUIDELINE 4 Design relation schemas so that they can be
joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated
Module 6 22042023
Spurious Tuples
There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies
Note that Property (a) is extremely important and cannot be
sacrificed Property (b) is less stringent and may be sacrificed
Module 6 23042023
Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done
during Insertion Modification Deletion
Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values
Generation of invalid and spurious data during joins on improperly related base relations
Module 6 24042023
Functional dependencies Functional dependencies (FDs)
Is a constraint between two sets of attributes from the database
Assumption The entire database is a single universal
relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes
Module 6 25042023
Definition
FDs are used to specify formal measures of the
goodness of relational designs keys that are used to define normal forms for
relations constraints that are derived from the meaning and
interrelationships of the data attributes A set of attributes X functionally determines
a set of attributes Y if the value of X determines a unique value for Y
Module 6 26042023
Functional Dependencies
A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If
t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r
depend on or are determined by the values of the X component
The values of the X component functionally determines the values of Y component
FDs are derived from the real-world constraints on the attributes
The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times
Module 6 27042023
Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760
Superior NA 31700 350
Victoria Africa 26828 250
Aral Sea Asia 24904 280
Huron NA 23000 206
Michigan NA 22300 307
Tanganyika Africa 12700 420
Continent -gtName Name -gtLength
Module 6 28042023
Graphical representation of Functional Dependencies
Module 6 29042023
Examples of FD constraints Social security number uniquely determines
employee name SSN -gt ENAME
Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION
Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS
Module 6 30042023
Examples of FD constraints A FD is a property of the attributes in the
schema R not of a particular legal relation state r of R
It must be defined explicitly by someone who knows the semantics of the attributes of R
The constraint must hold on every relation instance r(R)
If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with
t1[K]=t2[K])
Module 6 31042023
Satisfies algorithm
Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B
How it works Sort the tuples of the relation r on the A attributes so
that tuples with equal values under A are next to each other
Check that tuples with equal values under attributes A also have equal values under B
If it meets the condition 2 then the output of the algorithm is true else it is false
Module 6 32042023
Relation state of TEACH
TEACH
TEACHER COURSE TEXT
Teacher Course Text
Smith Data Structures
Bartram
Smith Data Management
Martin
Hall Compilers Hoffmann
Brown ooad Horowitz
TEACHER -gt COURSE
TEXT -gt COURSE
Module 6 33042023
Drawbacks of Satifies algorithm
Using this algorithm is tedious and time consuming
So inference axioms are used
Module 6 34042023
Inference Rules for Functional Dependencies
F is the set of functional dependencies that are specified on relation schema R
Schema designers specifies the most obvious FDs
The other dependencies can be inferred or deduced from FDs in F
Module 6 35042023
Example of Closure Department has one manager (DEPT_NO -gt
MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two
dependencies together imply that (DEPT_NO-gtMGR_PHONE)
This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F
The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)
Module 6 36042023
Example
F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are
SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME
To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y
Module 6 37042023
Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold
whenever the FDs in F hold Armstrongs inference rules
IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ
(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z
IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer
from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]
By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F
Module 6 38042023
Inference Rules for FDs Some additional inference rules that are useful
Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z
The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 3042023
Evaluating relation schemas
Two levels of relation schemasThe logical or conceptual view
How users interpret the relation schemas and the meaning of their attributes
Implementation or storage view How the tuples in the base relation are stored
and updated
Module 6 4042023
Informal Design Guidelines for Relational Databases Four informal measures of quality for
relation schema design are
1 Imparting clear semantics to attributes in Relations
2 Reducing the redundant values in tuples
3 Reducing the null values in tuples
4 Disallowing the possibility of generating spurious tuples
Module 6 5042023
1Semantics of the Relation Attributes GUIDELINE 1 Informally each tuple in a relation
should represent one entity or relationship instance (Applies to individual relations and their attributes) Attributes of different entities (EMPLOYEEs
DEPARTMENTs PROJECTs) should not be mixed in the same relation
Only foreign keys should be used to refer to other entities
Entity and relationship attributes should be kept apart as much as possible
Bottom Line Design a schema that can be explained easily relation by relation The semantics of attributes should be easy to interpret
Module 6 6042023
A Simplified COMPANY relational database schema
Module 6 7042023
Two relation schemas suffering from update anomalies
ENAME SSN BDATEADDRES
SDNUMBE
RDNAME
DMGRSSN
PLOCATION
SSNPNUMBE
RHOURS ENAME PNAME
EMP_PROJ
EMP_DEPT
Module 6 8042023
Two relation schemas suffering from update anomalies Although there is nothing wrong logically with
these 2 relations they are considered poor designs because they violate guideline 1 by mixing attributes from distinct real world entities
EMP_DEPT mixes attributes of employee and department and EMP_PROJ mixes attributes of employees amp projects and the WORKS_ON relationship
They may be used as views but they cause problems when used as base relations
Module 6 9042023
2Redundant Information in Tuples and Update Anomalies Goal of schema design is to minimize the
storage space used by the base relations Information is stored redundantly Wastes storage
Causes problems with update anomalies Insertion anomalies Deletion anomalies Modification anomalies
Module 6 10042023
Two relation schemas suffering from update anomalies
ENAME SSN BDATEADDRES
SDNUMBE
RDNAME
DMGRSSN
PLOCATION
SSNPNUMBE
RHOURS ENAME PNAME
EMP_DEPT
EMP_PROJ
Module 6 11042023
EXAMPLE OF AN INSERT ANOMALY Consider the relation
EMP_PROJ(Emp Proj Ename Pname No_hours)
Insert Anomaly Cannot insert a project unless an employee is
assigned to it Conversely
Cannot insert an employee unless an heshe is assigned to a project
Module 6 12042023
EXAMPLE OF AN DELETE ANOMALY Consider the relation
EMP_PROJ(Emp Proj Ename Pname No_hours)
Delete Anomaly When a project is deleted it will result in deleting
all the employees who work on that project Alternately if an employee is the sole employee
on a project deleting that employee would result in deleting the corresponding project
Module 6 13042023
EXAMPLE OF AN UPDATE ANOMALY Consider the relation
EMP_PROJ(Emp Proj Ename Pname No_hours)
Update AnomalyChanging the name of project number P1
from ldquoBillingrdquo to ldquoCustomer-Accountingrdquo may cause this update to be made for all 100 employees working on project P1
Module 6 14042023
Module 6 15042023
Guideline to Redundant Information in Tuples and Update Anomalies GUIDELINE 2
Design a schema that does not suffer from the insertion deletion and update anomalies
If there are any anomalies present then note them so that applications can be made to take them into account
In general it is advisable to use anomaly free base relations and to specify views that include the joins for placing together the attributes frequently referenced in important queries
Module 6 16042023
Problems with Nulls If many attributes are grouped together
as a fat relation it gives rise to many nulls in the tuples
Waste storage Problems in understanding the
meaning of the attributes Difficult while using Nulls in aggregate
operators like count or sum
Module 6 17042023
3 Null Values in Tuples Interpretations of nulls
Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist but unavailable
GUIDELINE 3 Relations should be designed such that their
tuples will have as few NULL values as possible Attributes that are NULL frequently could be
placed in separate relations (with the primary key) Example-
if only 10 of employees have individual offices it is better not to include office_number as an attribute in the employee relation
Better create a new relation emp_offices(essn office_number)
Module 6 18042023
Example of Spurious Tuples
Module 6 19042023
Generation of spurious tuples The two relations EMP_PROJ1 and EMP_LOCS as
the base relations of EMP_PROJ is not a good schema design
Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ
These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid
This is because the PLOCATION attribute which is used for joining is neither a primary key nor a foreign key in either EMP_LOCS AND EMP_PROJ1
Module 6 20042023
Example of Spurious Tuples contd
Module 6 21042023
4 Spurious Tuples Bad designs for a relational database may result
in erroneous results for certain JOIN operations The lossless join property is used to
guarantee meaningful results for join operations
GUIDELINE 4 Design relation schemas so that they can be
joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated
Module 6 22042023
Spurious Tuples
There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies
Note that Property (a) is extremely important and cannot be
sacrificed Property (b) is less stringent and may be sacrificed
Module 6 23042023
Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done
during Insertion Modification Deletion
Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values
Generation of invalid and spurious data during joins on improperly related base relations
Module 6 24042023
Functional dependencies Functional dependencies (FDs)
Is a constraint between two sets of attributes from the database
Assumption The entire database is a single universal
relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes
Module 6 25042023
Definition
FDs are used to specify formal measures of the
goodness of relational designs keys that are used to define normal forms for
relations constraints that are derived from the meaning and
interrelationships of the data attributes A set of attributes X functionally determines
a set of attributes Y if the value of X determines a unique value for Y
Module 6 26042023
Functional Dependencies
A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If
t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r
depend on or are determined by the values of the X component
The values of the X component functionally determines the values of Y component
FDs are derived from the real-world constraints on the attributes
The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times
Module 6 27042023
Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760
Superior NA 31700 350
Victoria Africa 26828 250
Aral Sea Asia 24904 280
Huron NA 23000 206
Michigan NA 22300 307
Tanganyika Africa 12700 420
Continent -gtName Name -gtLength
Module 6 28042023
Graphical representation of Functional Dependencies
Module 6 29042023
Examples of FD constraints Social security number uniquely determines
employee name SSN -gt ENAME
Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION
Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS
Module 6 30042023
Examples of FD constraints A FD is a property of the attributes in the
schema R not of a particular legal relation state r of R
It must be defined explicitly by someone who knows the semantics of the attributes of R
The constraint must hold on every relation instance r(R)
If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with
t1[K]=t2[K])
Module 6 31042023
Satisfies algorithm
Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B
How it works Sort the tuples of the relation r on the A attributes so
that tuples with equal values under A are next to each other
Check that tuples with equal values under attributes A also have equal values under B
If it meets the condition 2 then the output of the algorithm is true else it is false
Module 6 32042023
Relation state of TEACH
TEACH
TEACHER COURSE TEXT
Teacher Course Text
Smith Data Structures
Bartram
Smith Data Management
Martin
Hall Compilers Hoffmann
Brown ooad Horowitz
TEACHER -gt COURSE
TEXT -gt COURSE
Module 6 33042023
Drawbacks of Satifies algorithm
Using this algorithm is tedious and time consuming
So inference axioms are used
Module 6 34042023
Inference Rules for Functional Dependencies
F is the set of functional dependencies that are specified on relation schema R
Schema designers specifies the most obvious FDs
The other dependencies can be inferred or deduced from FDs in F
Module 6 35042023
Example of Closure Department has one manager (DEPT_NO -gt
MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two
dependencies together imply that (DEPT_NO-gtMGR_PHONE)
This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F
The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)
Module 6 36042023
Example
F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are
SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME
To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y
Module 6 37042023
Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold
whenever the FDs in F hold Armstrongs inference rules
IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ
(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z
IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer
from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]
By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F
Module 6 38042023
Inference Rules for FDs Some additional inference rules that are useful
Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z
The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 4042023
Informal Design Guidelines for Relational Databases Four informal measures of quality for
relation schema design are
1 Imparting clear semantics to attributes in Relations
2 Reducing the redundant values in tuples
3 Reducing the null values in tuples
4 Disallowing the possibility of generating spurious tuples
Module 6 5042023
1Semantics of the Relation Attributes GUIDELINE 1 Informally each tuple in a relation
should represent one entity or relationship instance (Applies to individual relations and their attributes) Attributes of different entities (EMPLOYEEs
DEPARTMENTs PROJECTs) should not be mixed in the same relation
Only foreign keys should be used to refer to other entities
Entity and relationship attributes should be kept apart as much as possible
Bottom Line Design a schema that can be explained easily relation by relation The semantics of attributes should be easy to interpret
Module 6 6042023
A Simplified COMPANY relational database schema
Module 6 7042023
Two relation schemas suffering from update anomalies
ENAME SSN BDATEADDRES
SDNUMBE
RDNAME
DMGRSSN
PLOCATION
SSNPNUMBE
RHOURS ENAME PNAME
EMP_PROJ
EMP_DEPT
Module 6 8042023
Two relation schemas suffering from update anomalies Although there is nothing wrong logically with
these 2 relations they are considered poor designs because they violate guideline 1 by mixing attributes from distinct real world entities
EMP_DEPT mixes attributes of employee and department and EMP_PROJ mixes attributes of employees amp projects and the WORKS_ON relationship
They may be used as views but they cause problems when used as base relations
Module 6 9042023
2Redundant Information in Tuples and Update Anomalies Goal of schema design is to minimize the
storage space used by the base relations Information is stored redundantly Wastes storage
Causes problems with update anomalies Insertion anomalies Deletion anomalies Modification anomalies
Module 6 10042023
Two relation schemas suffering from update anomalies
ENAME SSN BDATEADDRES
SDNUMBE
RDNAME
DMGRSSN
PLOCATION
SSNPNUMBE
RHOURS ENAME PNAME
EMP_DEPT
EMP_PROJ
Module 6 11042023
EXAMPLE OF AN INSERT ANOMALY Consider the relation
EMP_PROJ(Emp Proj Ename Pname No_hours)
Insert Anomaly Cannot insert a project unless an employee is
assigned to it Conversely
Cannot insert an employee unless an heshe is assigned to a project
Module 6 12042023
EXAMPLE OF AN DELETE ANOMALY Consider the relation
EMP_PROJ(Emp Proj Ename Pname No_hours)
Delete Anomaly When a project is deleted it will result in deleting
all the employees who work on that project Alternately if an employee is the sole employee
on a project deleting that employee would result in deleting the corresponding project
Module 6 13042023
EXAMPLE OF AN UPDATE ANOMALY Consider the relation
EMP_PROJ(Emp Proj Ename Pname No_hours)
Update AnomalyChanging the name of project number P1
from ldquoBillingrdquo to ldquoCustomer-Accountingrdquo may cause this update to be made for all 100 employees working on project P1
Module 6 14042023
Module 6 15042023
Guideline to Redundant Information in Tuples and Update Anomalies GUIDELINE 2
Design a schema that does not suffer from the insertion deletion and update anomalies
If there are any anomalies present then note them so that applications can be made to take them into account
In general it is advisable to use anomaly free base relations and to specify views that include the joins for placing together the attributes frequently referenced in important queries
Module 6 16042023
Problems with Nulls If many attributes are grouped together
as a fat relation it gives rise to many nulls in the tuples
Waste storage Problems in understanding the
meaning of the attributes Difficult while using Nulls in aggregate
operators like count or sum
Module 6 17042023
3 Null Values in Tuples Interpretations of nulls
Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist but unavailable
GUIDELINE 3 Relations should be designed such that their
tuples will have as few NULL values as possible Attributes that are NULL frequently could be
placed in separate relations (with the primary key) Example-
if only 10 of employees have individual offices it is better not to include office_number as an attribute in the employee relation
Better create a new relation emp_offices(essn office_number)
Module 6 18042023
Example of Spurious Tuples
Module 6 19042023
Generation of spurious tuples The two relations EMP_PROJ1 and EMP_LOCS as
the base relations of EMP_PROJ is not a good schema design
Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ
These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid
This is because the PLOCATION attribute which is used for joining is neither a primary key nor a foreign key in either EMP_LOCS AND EMP_PROJ1
Module 6 20042023
Example of Spurious Tuples contd
Module 6 21042023
4 Spurious Tuples Bad designs for a relational database may result
in erroneous results for certain JOIN operations The lossless join property is used to
guarantee meaningful results for join operations
GUIDELINE 4 Design relation schemas so that they can be
joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated
Module 6 22042023
Spurious Tuples
There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies
Note that Property (a) is extremely important and cannot be
sacrificed Property (b) is less stringent and may be sacrificed
Module 6 23042023
Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done
during Insertion Modification Deletion
Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values
Generation of invalid and spurious data during joins on improperly related base relations
Module 6 24042023
Functional dependencies Functional dependencies (FDs)
Is a constraint between two sets of attributes from the database
Assumption The entire database is a single universal
relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes
Module 6 25042023
Definition
FDs are used to specify formal measures of the
goodness of relational designs keys that are used to define normal forms for
relations constraints that are derived from the meaning and
interrelationships of the data attributes A set of attributes X functionally determines
a set of attributes Y if the value of X determines a unique value for Y
Module 6 26042023
Functional Dependencies
A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If
t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r
depend on or are determined by the values of the X component
The values of the X component functionally determines the values of Y component
FDs are derived from the real-world constraints on the attributes
The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times
Module 6 27042023
Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760
Superior NA 31700 350
Victoria Africa 26828 250
Aral Sea Asia 24904 280
Huron NA 23000 206
Michigan NA 22300 307
Tanganyika Africa 12700 420
Continent -gtName Name -gtLength
Module 6 28042023
Graphical representation of Functional Dependencies
Module 6 29042023
Examples of FD constraints Social security number uniquely determines
employee name SSN -gt ENAME
Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION
Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS
Module 6 30042023
Examples of FD constraints A FD is a property of the attributes in the
schema R not of a particular legal relation state r of R
It must be defined explicitly by someone who knows the semantics of the attributes of R
The constraint must hold on every relation instance r(R)
If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with
t1[K]=t2[K])
Module 6 31042023
Satisfies algorithm
Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B
How it works Sort the tuples of the relation r on the A attributes so
that tuples with equal values under A are next to each other
Check that tuples with equal values under attributes A also have equal values under B
If it meets the condition 2 then the output of the algorithm is true else it is false
Module 6 32042023
Relation state of TEACH
TEACH
TEACHER COURSE TEXT
Teacher Course Text
Smith Data Structures
Bartram
Smith Data Management
Martin
Hall Compilers Hoffmann
Brown ooad Horowitz
TEACHER -gt COURSE
TEXT -gt COURSE
Module 6 33042023
Drawbacks of Satifies algorithm
Using this algorithm is tedious and time consuming
So inference axioms are used
Module 6 34042023
Inference Rules for Functional Dependencies
F is the set of functional dependencies that are specified on relation schema R
Schema designers specifies the most obvious FDs
The other dependencies can be inferred or deduced from FDs in F
Module 6 35042023
Example of Closure Department has one manager (DEPT_NO -gt
MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two
dependencies together imply that (DEPT_NO-gtMGR_PHONE)
This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F
The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)
Module 6 36042023
Example
F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are
SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME
To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y
Module 6 37042023
Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold
whenever the FDs in F hold Armstrongs inference rules
IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ
(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z
IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer
from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]
By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F
Module 6 38042023
Inference Rules for FDs Some additional inference rules that are useful
Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z
The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 5042023
1Semantics of the Relation Attributes GUIDELINE 1 Informally each tuple in a relation
should represent one entity or relationship instance (Applies to individual relations and their attributes) Attributes of different entities (EMPLOYEEs
DEPARTMENTs PROJECTs) should not be mixed in the same relation
Only foreign keys should be used to refer to other entities
Entity and relationship attributes should be kept apart as much as possible
Bottom Line Design a schema that can be explained easily relation by relation The semantics of attributes should be easy to interpret
Module 6 6042023
A Simplified COMPANY relational database schema
Module 6 7042023
Two relation schemas suffering from update anomalies
ENAME SSN BDATEADDRES
SDNUMBE
RDNAME
DMGRSSN
PLOCATION
SSNPNUMBE
RHOURS ENAME PNAME
EMP_PROJ
EMP_DEPT
Module 6 8042023
Two relation schemas suffering from update anomalies Although there is nothing wrong logically with
these 2 relations they are considered poor designs because they violate guideline 1 by mixing attributes from distinct real world entities
EMP_DEPT mixes attributes of employee and department and EMP_PROJ mixes attributes of employees amp projects and the WORKS_ON relationship
They may be used as views but they cause problems when used as base relations
Module 6 9042023
2Redundant Information in Tuples and Update Anomalies Goal of schema design is to minimize the
storage space used by the base relations Information is stored redundantly Wastes storage
Causes problems with update anomalies Insertion anomalies Deletion anomalies Modification anomalies
Module 6 10042023
Two relation schemas suffering from update anomalies
ENAME SSN BDATEADDRES
SDNUMBE
RDNAME
DMGRSSN
PLOCATION
SSNPNUMBE
RHOURS ENAME PNAME
EMP_DEPT
EMP_PROJ
Module 6 11042023
EXAMPLE OF AN INSERT ANOMALY Consider the relation
EMP_PROJ(Emp Proj Ename Pname No_hours)
Insert Anomaly Cannot insert a project unless an employee is
assigned to it Conversely
Cannot insert an employee unless an heshe is assigned to a project
Module 6 12042023
EXAMPLE OF AN DELETE ANOMALY Consider the relation
EMP_PROJ(Emp Proj Ename Pname No_hours)
Delete Anomaly When a project is deleted it will result in deleting
all the employees who work on that project Alternately if an employee is the sole employee
on a project deleting that employee would result in deleting the corresponding project
Module 6 13042023
EXAMPLE OF AN UPDATE ANOMALY Consider the relation
EMP_PROJ(Emp Proj Ename Pname No_hours)
Update AnomalyChanging the name of project number P1
from ldquoBillingrdquo to ldquoCustomer-Accountingrdquo may cause this update to be made for all 100 employees working on project P1
Module 6 14042023
Module 6 15042023
Guideline to Redundant Information in Tuples and Update Anomalies GUIDELINE 2
Design a schema that does not suffer from the insertion deletion and update anomalies
If there are any anomalies present then note them so that applications can be made to take them into account
In general it is advisable to use anomaly free base relations and to specify views that include the joins for placing together the attributes frequently referenced in important queries
Module 6 16042023
Problems with Nulls If many attributes are grouped together
as a fat relation it gives rise to many nulls in the tuples
Waste storage Problems in understanding the
meaning of the attributes Difficult while using Nulls in aggregate
operators like count or sum
Module 6 17042023
3 Null Values in Tuples Interpretations of nulls
Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist but unavailable
GUIDELINE 3 Relations should be designed such that their
tuples will have as few NULL values as possible Attributes that are NULL frequently could be
placed in separate relations (with the primary key) Example-
if only 10 of employees have individual offices it is better not to include office_number as an attribute in the employee relation
Better create a new relation emp_offices(essn office_number)
Module 6 18042023
Example of Spurious Tuples
Module 6 19042023
Generation of spurious tuples The two relations EMP_PROJ1 and EMP_LOCS as
the base relations of EMP_PROJ is not a good schema design
Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ
These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid
This is because the PLOCATION attribute which is used for joining is neither a primary key nor a foreign key in either EMP_LOCS AND EMP_PROJ1
Module 6 20042023
Example of Spurious Tuples contd
Module 6 21042023
4 Spurious Tuples Bad designs for a relational database may result
in erroneous results for certain JOIN operations The lossless join property is used to
guarantee meaningful results for join operations
GUIDELINE 4 Design relation schemas so that they can be
joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated
Module 6 22042023
Spurious Tuples
There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies
Note that Property (a) is extremely important and cannot be
sacrificed Property (b) is less stringent and may be sacrificed
Module 6 23042023
Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done
during Insertion Modification Deletion
Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values
Generation of invalid and spurious data during joins on improperly related base relations
Module 6 24042023
Functional dependencies Functional dependencies (FDs)
Is a constraint between two sets of attributes from the database
Assumption The entire database is a single universal
relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes
Module 6 25042023
Definition
FDs are used to specify formal measures of the
goodness of relational designs keys that are used to define normal forms for
relations constraints that are derived from the meaning and
interrelationships of the data attributes A set of attributes X functionally determines
a set of attributes Y if the value of X determines a unique value for Y
Module 6 26042023
Functional Dependencies
A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If
t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r
depend on or are determined by the values of the X component
The values of the X component functionally determines the values of Y component
FDs are derived from the real-world constraints on the attributes
The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times
Module 6 27042023
Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760
Superior NA 31700 350
Victoria Africa 26828 250
Aral Sea Asia 24904 280
Huron NA 23000 206
Michigan NA 22300 307
Tanganyika Africa 12700 420
Continent -gtName Name -gtLength
Module 6 28042023
Graphical representation of Functional Dependencies
Module 6 29042023
Examples of FD constraints Social security number uniquely determines
employee name SSN -gt ENAME
Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION
Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS
Module 6 30042023
Examples of FD constraints A FD is a property of the attributes in the
schema R not of a particular legal relation state r of R
It must be defined explicitly by someone who knows the semantics of the attributes of R
The constraint must hold on every relation instance r(R)
If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with
t1[K]=t2[K])
Module 6 31042023
Satisfies algorithm
Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B
How it works Sort the tuples of the relation r on the A attributes so
that tuples with equal values under A are next to each other
Check that tuples with equal values under attributes A also have equal values under B
If it meets the condition 2 then the output of the algorithm is true else it is false
Module 6 32042023
Relation state of TEACH
TEACH
TEACHER COURSE TEXT
Teacher Course Text
Smith Data Structures
Bartram
Smith Data Management
Martin
Hall Compilers Hoffmann
Brown ooad Horowitz
TEACHER -gt COURSE
TEXT -gt COURSE
Module 6 33042023
Drawbacks of Satifies algorithm
Using this algorithm is tedious and time consuming
So inference axioms are used
Module 6 34042023
Inference Rules for Functional Dependencies
F is the set of functional dependencies that are specified on relation schema R
Schema designers specifies the most obvious FDs
The other dependencies can be inferred or deduced from FDs in F
Module 6 35042023
Example of Closure Department has one manager (DEPT_NO -gt
MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two
dependencies together imply that (DEPT_NO-gtMGR_PHONE)
This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F
The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)
Module 6 36042023
Example
F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are
SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME
To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y
Module 6 37042023
Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold
whenever the FDs in F hold Armstrongs inference rules
IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ
(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z
IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer
from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]
By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F
Module 6 38042023
Inference Rules for FDs Some additional inference rules that are useful
Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z
The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 6042023
A Simplified COMPANY relational database schema
Module 6 7042023
Two relation schemas suffering from update anomalies
ENAME SSN BDATEADDRES
SDNUMBE
RDNAME
DMGRSSN
PLOCATION
SSNPNUMBE
RHOURS ENAME PNAME
EMP_PROJ
EMP_DEPT
Module 6 8042023
Two relation schemas suffering from update anomalies Although there is nothing wrong logically with
these 2 relations they are considered poor designs because they violate guideline 1 by mixing attributes from distinct real world entities
EMP_DEPT mixes attributes of employee and department and EMP_PROJ mixes attributes of employees amp projects and the WORKS_ON relationship
They may be used as views but they cause problems when used as base relations
Module 6 9042023
2Redundant Information in Tuples and Update Anomalies Goal of schema design is to minimize the
storage space used by the base relations Information is stored redundantly Wastes storage
Causes problems with update anomalies Insertion anomalies Deletion anomalies Modification anomalies
Module 6 10042023
Two relation schemas suffering from update anomalies
ENAME SSN BDATEADDRES
SDNUMBE
RDNAME
DMGRSSN
PLOCATION
SSNPNUMBE
RHOURS ENAME PNAME
EMP_DEPT
EMP_PROJ
Module 6 11042023
EXAMPLE OF AN INSERT ANOMALY Consider the relation
EMP_PROJ(Emp Proj Ename Pname No_hours)
Insert Anomaly Cannot insert a project unless an employee is
assigned to it Conversely
Cannot insert an employee unless an heshe is assigned to a project
Module 6 12042023
EXAMPLE OF AN DELETE ANOMALY Consider the relation
EMP_PROJ(Emp Proj Ename Pname No_hours)
Delete Anomaly When a project is deleted it will result in deleting
all the employees who work on that project Alternately if an employee is the sole employee
on a project deleting that employee would result in deleting the corresponding project
Module 6 13042023
EXAMPLE OF AN UPDATE ANOMALY Consider the relation
EMP_PROJ(Emp Proj Ename Pname No_hours)
Update AnomalyChanging the name of project number P1
from ldquoBillingrdquo to ldquoCustomer-Accountingrdquo may cause this update to be made for all 100 employees working on project P1
Module 6 14042023
Module 6 15042023
Guideline to Redundant Information in Tuples and Update Anomalies GUIDELINE 2
Design a schema that does not suffer from the insertion deletion and update anomalies
If there are any anomalies present then note them so that applications can be made to take them into account
In general it is advisable to use anomaly free base relations and to specify views that include the joins for placing together the attributes frequently referenced in important queries
Module 6 16042023
Problems with Nulls If many attributes are grouped together
as a fat relation it gives rise to many nulls in the tuples
Waste storage Problems in understanding the
meaning of the attributes Difficult while using Nulls in aggregate
operators like count or sum
Module 6 17042023
3 Null Values in Tuples Interpretations of nulls
Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist but unavailable
GUIDELINE 3 Relations should be designed such that their
tuples will have as few NULL values as possible Attributes that are NULL frequently could be
placed in separate relations (with the primary key) Example-
if only 10 of employees have individual offices it is better not to include office_number as an attribute in the employee relation
Better create a new relation emp_offices(essn office_number)
Module 6 18042023
Example of Spurious Tuples
Module 6 19042023
Generation of spurious tuples The two relations EMP_PROJ1 and EMP_LOCS as
the base relations of EMP_PROJ is not a good schema design
Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ
These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid
This is because the PLOCATION attribute which is used for joining is neither a primary key nor a foreign key in either EMP_LOCS AND EMP_PROJ1
Module 6 20042023
Example of Spurious Tuples contd
Module 6 21042023
4 Spurious Tuples Bad designs for a relational database may result
in erroneous results for certain JOIN operations The lossless join property is used to
guarantee meaningful results for join operations
GUIDELINE 4 Design relation schemas so that they can be
joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated
Module 6 22042023
Spurious Tuples
There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies
Note that Property (a) is extremely important and cannot be
sacrificed Property (b) is less stringent and may be sacrificed
Module 6 23042023
Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done
during Insertion Modification Deletion
Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values
Generation of invalid and spurious data during joins on improperly related base relations
Module 6 24042023
Functional dependencies Functional dependencies (FDs)
Is a constraint between two sets of attributes from the database
Assumption The entire database is a single universal
relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes
Module 6 25042023
Definition
FDs are used to specify formal measures of the
goodness of relational designs keys that are used to define normal forms for
relations constraints that are derived from the meaning and
interrelationships of the data attributes A set of attributes X functionally determines
a set of attributes Y if the value of X determines a unique value for Y
Module 6 26042023
Functional Dependencies
A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If
t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r
depend on or are determined by the values of the X component
The values of the X component functionally determines the values of Y component
FDs are derived from the real-world constraints on the attributes
The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times
Module 6 27042023
Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760
Superior NA 31700 350
Victoria Africa 26828 250
Aral Sea Asia 24904 280
Huron NA 23000 206
Michigan NA 22300 307
Tanganyika Africa 12700 420
Continent -gtName Name -gtLength
Module 6 28042023
Graphical representation of Functional Dependencies
Module 6 29042023
Examples of FD constraints Social security number uniquely determines
employee name SSN -gt ENAME
Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION
Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS
Module 6 30042023
Examples of FD constraints A FD is a property of the attributes in the
schema R not of a particular legal relation state r of R
It must be defined explicitly by someone who knows the semantics of the attributes of R
The constraint must hold on every relation instance r(R)
If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with
t1[K]=t2[K])
Module 6 31042023
Satisfies algorithm
Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B
How it works Sort the tuples of the relation r on the A attributes so
that tuples with equal values under A are next to each other
Check that tuples with equal values under attributes A also have equal values under B
If it meets the condition 2 then the output of the algorithm is true else it is false
Module 6 32042023
Relation state of TEACH
TEACH
TEACHER COURSE TEXT
Teacher Course Text
Smith Data Structures
Bartram
Smith Data Management
Martin
Hall Compilers Hoffmann
Brown ooad Horowitz
TEACHER -gt COURSE
TEXT -gt COURSE
Module 6 33042023
Drawbacks of Satifies algorithm
Using this algorithm is tedious and time consuming
So inference axioms are used
Module 6 34042023
Inference Rules for Functional Dependencies
F is the set of functional dependencies that are specified on relation schema R
Schema designers specifies the most obvious FDs
The other dependencies can be inferred or deduced from FDs in F
Module 6 35042023
Example of Closure Department has one manager (DEPT_NO -gt
MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two
dependencies together imply that (DEPT_NO-gtMGR_PHONE)
This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F
The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)
Module 6 36042023
Example
F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are
SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME
To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y
Module 6 37042023
Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold
whenever the FDs in F hold Armstrongs inference rules
IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ
(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z
IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer
from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]
By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F
Module 6 38042023
Inference Rules for FDs Some additional inference rules that are useful
Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z
The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 7042023
Two relation schemas suffering from update anomalies
ENAME SSN BDATEADDRES
SDNUMBE
RDNAME
DMGRSSN
PLOCATION
SSNPNUMBE
RHOURS ENAME PNAME
EMP_PROJ
EMP_DEPT
Module 6 8042023
Two relation schemas suffering from update anomalies Although there is nothing wrong logically with
these 2 relations they are considered poor designs because they violate guideline 1 by mixing attributes from distinct real world entities
EMP_DEPT mixes attributes of employee and department and EMP_PROJ mixes attributes of employees amp projects and the WORKS_ON relationship
They may be used as views but they cause problems when used as base relations
Module 6 9042023
2Redundant Information in Tuples and Update Anomalies Goal of schema design is to minimize the
storage space used by the base relations Information is stored redundantly Wastes storage
Causes problems with update anomalies Insertion anomalies Deletion anomalies Modification anomalies
Module 6 10042023
Two relation schemas suffering from update anomalies
ENAME SSN BDATEADDRES
SDNUMBE
RDNAME
DMGRSSN
PLOCATION
SSNPNUMBE
RHOURS ENAME PNAME
EMP_DEPT
EMP_PROJ
Module 6 11042023
EXAMPLE OF AN INSERT ANOMALY Consider the relation
EMP_PROJ(Emp Proj Ename Pname No_hours)
Insert Anomaly Cannot insert a project unless an employee is
assigned to it Conversely
Cannot insert an employee unless an heshe is assigned to a project
Module 6 12042023
EXAMPLE OF AN DELETE ANOMALY Consider the relation
EMP_PROJ(Emp Proj Ename Pname No_hours)
Delete Anomaly When a project is deleted it will result in deleting
all the employees who work on that project Alternately if an employee is the sole employee
on a project deleting that employee would result in deleting the corresponding project
Module 6 13042023
EXAMPLE OF AN UPDATE ANOMALY Consider the relation
EMP_PROJ(Emp Proj Ename Pname No_hours)
Update AnomalyChanging the name of project number P1
from ldquoBillingrdquo to ldquoCustomer-Accountingrdquo may cause this update to be made for all 100 employees working on project P1
Module 6 14042023
Module 6 15042023
Guideline to Redundant Information in Tuples and Update Anomalies GUIDELINE 2
Design a schema that does not suffer from the insertion deletion and update anomalies
If there are any anomalies present then note them so that applications can be made to take them into account
In general it is advisable to use anomaly free base relations and to specify views that include the joins for placing together the attributes frequently referenced in important queries
Module 6 16042023
Problems with Nulls If many attributes are grouped together
as a fat relation it gives rise to many nulls in the tuples
Waste storage Problems in understanding the
meaning of the attributes Difficult while using Nulls in aggregate
operators like count or sum
Module 6 17042023
3 Null Values in Tuples Interpretations of nulls
Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist but unavailable
GUIDELINE 3 Relations should be designed such that their
tuples will have as few NULL values as possible Attributes that are NULL frequently could be
placed in separate relations (with the primary key) Example-
if only 10 of employees have individual offices it is better not to include office_number as an attribute in the employee relation
Better create a new relation emp_offices(essn office_number)
Module 6 18042023
Example of Spurious Tuples
Module 6 19042023
Generation of spurious tuples The two relations EMP_PROJ1 and EMP_LOCS as
the base relations of EMP_PROJ is not a good schema design
Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ
These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid
This is because the PLOCATION attribute which is used for joining is neither a primary key nor a foreign key in either EMP_LOCS AND EMP_PROJ1
Module 6 20042023
Example of Spurious Tuples contd
Module 6 21042023
4 Spurious Tuples Bad designs for a relational database may result
in erroneous results for certain JOIN operations The lossless join property is used to
guarantee meaningful results for join operations
GUIDELINE 4 Design relation schemas so that they can be
joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated
Module 6 22042023
Spurious Tuples
There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies
Note that Property (a) is extremely important and cannot be
sacrificed Property (b) is less stringent and may be sacrificed
Module 6 23042023
Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done
during Insertion Modification Deletion
Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values
Generation of invalid and spurious data during joins on improperly related base relations
Module 6 24042023
Functional dependencies Functional dependencies (FDs)
Is a constraint between two sets of attributes from the database
Assumption The entire database is a single universal
relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes
Module 6 25042023
Definition
FDs are used to specify formal measures of the
goodness of relational designs keys that are used to define normal forms for
relations constraints that are derived from the meaning and
interrelationships of the data attributes A set of attributes X functionally determines
a set of attributes Y if the value of X determines a unique value for Y
Module 6 26042023
Functional Dependencies
A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If
t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r
depend on or are determined by the values of the X component
The values of the X component functionally determines the values of Y component
FDs are derived from the real-world constraints on the attributes
The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times
Module 6 27042023
Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760
Superior NA 31700 350
Victoria Africa 26828 250
Aral Sea Asia 24904 280
Huron NA 23000 206
Michigan NA 22300 307
Tanganyika Africa 12700 420
Continent -gtName Name -gtLength
Module 6 28042023
Graphical representation of Functional Dependencies
Module 6 29042023
Examples of FD constraints Social security number uniquely determines
employee name SSN -gt ENAME
Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION
Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS
Module 6 30042023
Examples of FD constraints A FD is a property of the attributes in the
schema R not of a particular legal relation state r of R
It must be defined explicitly by someone who knows the semantics of the attributes of R
The constraint must hold on every relation instance r(R)
If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with
t1[K]=t2[K])
Module 6 31042023
Satisfies algorithm
Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B
How it works Sort the tuples of the relation r on the A attributes so
that tuples with equal values under A are next to each other
Check that tuples with equal values under attributes A also have equal values under B
If it meets the condition 2 then the output of the algorithm is true else it is false
Module 6 32042023
Relation state of TEACH
TEACH
TEACHER COURSE TEXT
Teacher Course Text
Smith Data Structures
Bartram
Smith Data Management
Martin
Hall Compilers Hoffmann
Brown ooad Horowitz
TEACHER -gt COURSE
TEXT -gt COURSE
Module 6 33042023
Drawbacks of Satifies algorithm
Using this algorithm is tedious and time consuming
So inference axioms are used
Module 6 34042023
Inference Rules for Functional Dependencies
F is the set of functional dependencies that are specified on relation schema R
Schema designers specifies the most obvious FDs
The other dependencies can be inferred or deduced from FDs in F
Module 6 35042023
Example of Closure Department has one manager (DEPT_NO -gt
MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two
dependencies together imply that (DEPT_NO-gtMGR_PHONE)
This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F
The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)
Module 6 36042023
Example
F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are
SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME
To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y
Module 6 37042023
Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold
whenever the FDs in F hold Armstrongs inference rules
IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ
(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z
IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer
from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]
By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F
Module 6 38042023
Inference Rules for FDs Some additional inference rules that are useful
Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z
The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 8042023
Two relation schemas suffering from update anomalies Although there is nothing wrong logically with
these 2 relations they are considered poor designs because they violate guideline 1 by mixing attributes from distinct real world entities
EMP_DEPT mixes attributes of employee and department and EMP_PROJ mixes attributes of employees amp projects and the WORKS_ON relationship
They may be used as views but they cause problems when used as base relations
Module 6 9042023
2Redundant Information in Tuples and Update Anomalies Goal of schema design is to minimize the
storage space used by the base relations Information is stored redundantly Wastes storage
Causes problems with update anomalies Insertion anomalies Deletion anomalies Modification anomalies
Module 6 10042023
Two relation schemas suffering from update anomalies
ENAME SSN BDATEADDRES
SDNUMBE
RDNAME
DMGRSSN
PLOCATION
SSNPNUMBE
RHOURS ENAME PNAME
EMP_DEPT
EMP_PROJ
Module 6 11042023
EXAMPLE OF AN INSERT ANOMALY Consider the relation
EMP_PROJ(Emp Proj Ename Pname No_hours)
Insert Anomaly Cannot insert a project unless an employee is
assigned to it Conversely
Cannot insert an employee unless an heshe is assigned to a project
Module 6 12042023
EXAMPLE OF AN DELETE ANOMALY Consider the relation
EMP_PROJ(Emp Proj Ename Pname No_hours)
Delete Anomaly When a project is deleted it will result in deleting
all the employees who work on that project Alternately if an employee is the sole employee
on a project deleting that employee would result in deleting the corresponding project
Module 6 13042023
EXAMPLE OF AN UPDATE ANOMALY Consider the relation
EMP_PROJ(Emp Proj Ename Pname No_hours)
Update AnomalyChanging the name of project number P1
from ldquoBillingrdquo to ldquoCustomer-Accountingrdquo may cause this update to be made for all 100 employees working on project P1
Module 6 14042023
Module 6 15042023
Guideline to Redundant Information in Tuples and Update Anomalies GUIDELINE 2
Design a schema that does not suffer from the insertion deletion and update anomalies
If there are any anomalies present then note them so that applications can be made to take them into account
In general it is advisable to use anomaly free base relations and to specify views that include the joins for placing together the attributes frequently referenced in important queries
Module 6 16042023
Problems with Nulls If many attributes are grouped together
as a fat relation it gives rise to many nulls in the tuples
Waste storage Problems in understanding the
meaning of the attributes Difficult while using Nulls in aggregate
operators like count or sum
Module 6 17042023
3 Null Values in Tuples Interpretations of nulls
Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist but unavailable
GUIDELINE 3 Relations should be designed such that their
tuples will have as few NULL values as possible Attributes that are NULL frequently could be
placed in separate relations (with the primary key) Example-
if only 10 of employees have individual offices it is better not to include office_number as an attribute in the employee relation
Better create a new relation emp_offices(essn office_number)
Module 6 18042023
Example of Spurious Tuples
Module 6 19042023
Generation of spurious tuples The two relations EMP_PROJ1 and EMP_LOCS as
the base relations of EMP_PROJ is not a good schema design
Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ
These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid
This is because the PLOCATION attribute which is used for joining is neither a primary key nor a foreign key in either EMP_LOCS AND EMP_PROJ1
Module 6 20042023
Example of Spurious Tuples contd
Module 6 21042023
4 Spurious Tuples Bad designs for a relational database may result
in erroneous results for certain JOIN operations The lossless join property is used to
guarantee meaningful results for join operations
GUIDELINE 4 Design relation schemas so that they can be
joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated
Module 6 22042023
Spurious Tuples
There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies
Note that Property (a) is extremely important and cannot be
sacrificed Property (b) is less stringent and may be sacrificed
Module 6 23042023
Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done
during Insertion Modification Deletion
Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values
Generation of invalid and spurious data during joins on improperly related base relations
Module 6 24042023
Functional dependencies Functional dependencies (FDs)
Is a constraint between two sets of attributes from the database
Assumption The entire database is a single universal
relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes
Module 6 25042023
Definition
FDs are used to specify formal measures of the
goodness of relational designs keys that are used to define normal forms for
relations constraints that are derived from the meaning and
interrelationships of the data attributes A set of attributes X functionally determines
a set of attributes Y if the value of X determines a unique value for Y
Module 6 26042023
Functional Dependencies
A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If
t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r
depend on or are determined by the values of the X component
The values of the X component functionally determines the values of Y component
FDs are derived from the real-world constraints on the attributes
The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times
Module 6 27042023
Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760
Superior NA 31700 350
Victoria Africa 26828 250
Aral Sea Asia 24904 280
Huron NA 23000 206
Michigan NA 22300 307
Tanganyika Africa 12700 420
Continent -gtName Name -gtLength
Module 6 28042023
Graphical representation of Functional Dependencies
Module 6 29042023
Examples of FD constraints Social security number uniquely determines
employee name SSN -gt ENAME
Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION
Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS
Module 6 30042023
Examples of FD constraints A FD is a property of the attributes in the
schema R not of a particular legal relation state r of R
It must be defined explicitly by someone who knows the semantics of the attributes of R
The constraint must hold on every relation instance r(R)
If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with
t1[K]=t2[K])
Module 6 31042023
Satisfies algorithm
Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B
How it works Sort the tuples of the relation r on the A attributes so
that tuples with equal values under A are next to each other
Check that tuples with equal values under attributes A also have equal values under B
If it meets the condition 2 then the output of the algorithm is true else it is false
Module 6 32042023
Relation state of TEACH
TEACH
TEACHER COURSE TEXT
Teacher Course Text
Smith Data Structures
Bartram
Smith Data Management
Martin
Hall Compilers Hoffmann
Brown ooad Horowitz
TEACHER -gt COURSE
TEXT -gt COURSE
Module 6 33042023
Drawbacks of Satifies algorithm
Using this algorithm is tedious and time consuming
So inference axioms are used
Module 6 34042023
Inference Rules for Functional Dependencies
F is the set of functional dependencies that are specified on relation schema R
Schema designers specifies the most obvious FDs
The other dependencies can be inferred or deduced from FDs in F
Module 6 35042023
Example of Closure Department has one manager (DEPT_NO -gt
MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two
dependencies together imply that (DEPT_NO-gtMGR_PHONE)
This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F
The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)
Module 6 36042023
Example
F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are
SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME
To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y
Module 6 37042023
Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold
whenever the FDs in F hold Armstrongs inference rules
IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ
(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z
IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer
from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]
By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F
Module 6 38042023
Inference Rules for FDs Some additional inference rules that are useful
Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z
The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 9042023
2Redundant Information in Tuples and Update Anomalies Goal of schema design is to minimize the
storage space used by the base relations Information is stored redundantly Wastes storage
Causes problems with update anomalies Insertion anomalies Deletion anomalies Modification anomalies
Module 6 10042023
Two relation schemas suffering from update anomalies
ENAME SSN BDATEADDRES
SDNUMBE
RDNAME
DMGRSSN
PLOCATION
SSNPNUMBE
RHOURS ENAME PNAME
EMP_DEPT
EMP_PROJ
Module 6 11042023
EXAMPLE OF AN INSERT ANOMALY Consider the relation
EMP_PROJ(Emp Proj Ename Pname No_hours)
Insert Anomaly Cannot insert a project unless an employee is
assigned to it Conversely
Cannot insert an employee unless an heshe is assigned to a project
Module 6 12042023
EXAMPLE OF AN DELETE ANOMALY Consider the relation
EMP_PROJ(Emp Proj Ename Pname No_hours)
Delete Anomaly When a project is deleted it will result in deleting
all the employees who work on that project Alternately if an employee is the sole employee
on a project deleting that employee would result in deleting the corresponding project
Module 6 13042023
EXAMPLE OF AN UPDATE ANOMALY Consider the relation
EMP_PROJ(Emp Proj Ename Pname No_hours)
Update AnomalyChanging the name of project number P1
from ldquoBillingrdquo to ldquoCustomer-Accountingrdquo may cause this update to be made for all 100 employees working on project P1
Module 6 14042023
Module 6 15042023
Guideline to Redundant Information in Tuples and Update Anomalies GUIDELINE 2
Design a schema that does not suffer from the insertion deletion and update anomalies
If there are any anomalies present then note them so that applications can be made to take them into account
In general it is advisable to use anomaly free base relations and to specify views that include the joins for placing together the attributes frequently referenced in important queries
Module 6 16042023
Problems with Nulls If many attributes are grouped together
as a fat relation it gives rise to many nulls in the tuples
Waste storage Problems in understanding the
meaning of the attributes Difficult while using Nulls in aggregate
operators like count or sum
Module 6 17042023
3 Null Values in Tuples Interpretations of nulls
Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist but unavailable
GUIDELINE 3 Relations should be designed such that their
tuples will have as few NULL values as possible Attributes that are NULL frequently could be
placed in separate relations (with the primary key) Example-
if only 10 of employees have individual offices it is better not to include office_number as an attribute in the employee relation
Better create a new relation emp_offices(essn office_number)
Module 6 18042023
Example of Spurious Tuples
Module 6 19042023
Generation of spurious tuples The two relations EMP_PROJ1 and EMP_LOCS as
the base relations of EMP_PROJ is not a good schema design
Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ
These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid
This is because the PLOCATION attribute which is used for joining is neither a primary key nor a foreign key in either EMP_LOCS AND EMP_PROJ1
Module 6 20042023
Example of Spurious Tuples contd
Module 6 21042023
4 Spurious Tuples Bad designs for a relational database may result
in erroneous results for certain JOIN operations The lossless join property is used to
guarantee meaningful results for join operations
GUIDELINE 4 Design relation schemas so that they can be
joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated
Module 6 22042023
Spurious Tuples
There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies
Note that Property (a) is extremely important and cannot be
sacrificed Property (b) is less stringent and may be sacrificed
Module 6 23042023
Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done
during Insertion Modification Deletion
Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values
Generation of invalid and spurious data during joins on improperly related base relations
Module 6 24042023
Functional dependencies Functional dependencies (FDs)
Is a constraint between two sets of attributes from the database
Assumption The entire database is a single universal
relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes
Module 6 25042023
Definition
FDs are used to specify formal measures of the
goodness of relational designs keys that are used to define normal forms for
relations constraints that are derived from the meaning and
interrelationships of the data attributes A set of attributes X functionally determines
a set of attributes Y if the value of X determines a unique value for Y
Module 6 26042023
Functional Dependencies
A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If
t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r
depend on or are determined by the values of the X component
The values of the X component functionally determines the values of Y component
FDs are derived from the real-world constraints on the attributes
The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times
Module 6 27042023
Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760
Superior NA 31700 350
Victoria Africa 26828 250
Aral Sea Asia 24904 280
Huron NA 23000 206
Michigan NA 22300 307
Tanganyika Africa 12700 420
Continent -gtName Name -gtLength
Module 6 28042023
Graphical representation of Functional Dependencies
Module 6 29042023
Examples of FD constraints Social security number uniquely determines
employee name SSN -gt ENAME
Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION
Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS
Module 6 30042023
Examples of FD constraints A FD is a property of the attributes in the
schema R not of a particular legal relation state r of R
It must be defined explicitly by someone who knows the semantics of the attributes of R
The constraint must hold on every relation instance r(R)
If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with
t1[K]=t2[K])
Module 6 31042023
Satisfies algorithm
Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B
How it works Sort the tuples of the relation r on the A attributes so
that tuples with equal values under A are next to each other
Check that tuples with equal values under attributes A also have equal values under B
If it meets the condition 2 then the output of the algorithm is true else it is false
Module 6 32042023
Relation state of TEACH
TEACH
TEACHER COURSE TEXT
Teacher Course Text
Smith Data Structures
Bartram
Smith Data Management
Martin
Hall Compilers Hoffmann
Brown ooad Horowitz
TEACHER -gt COURSE
TEXT -gt COURSE
Module 6 33042023
Drawbacks of Satifies algorithm
Using this algorithm is tedious and time consuming
So inference axioms are used
Module 6 34042023
Inference Rules for Functional Dependencies
F is the set of functional dependencies that are specified on relation schema R
Schema designers specifies the most obvious FDs
The other dependencies can be inferred or deduced from FDs in F
Module 6 35042023
Example of Closure Department has one manager (DEPT_NO -gt
MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two
dependencies together imply that (DEPT_NO-gtMGR_PHONE)
This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F
The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)
Module 6 36042023
Example
F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are
SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME
To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y
Module 6 37042023
Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold
whenever the FDs in F hold Armstrongs inference rules
IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ
(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z
IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer
from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]
By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F
Module 6 38042023
Inference Rules for FDs Some additional inference rules that are useful
Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z
The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 10042023
Two relation schemas suffering from update anomalies
ENAME SSN BDATEADDRES
SDNUMBE
RDNAME
DMGRSSN
PLOCATION
SSNPNUMBE
RHOURS ENAME PNAME
EMP_DEPT
EMP_PROJ
Module 6 11042023
EXAMPLE OF AN INSERT ANOMALY Consider the relation
EMP_PROJ(Emp Proj Ename Pname No_hours)
Insert Anomaly Cannot insert a project unless an employee is
assigned to it Conversely
Cannot insert an employee unless an heshe is assigned to a project
Module 6 12042023
EXAMPLE OF AN DELETE ANOMALY Consider the relation
EMP_PROJ(Emp Proj Ename Pname No_hours)
Delete Anomaly When a project is deleted it will result in deleting
all the employees who work on that project Alternately if an employee is the sole employee
on a project deleting that employee would result in deleting the corresponding project
Module 6 13042023
EXAMPLE OF AN UPDATE ANOMALY Consider the relation
EMP_PROJ(Emp Proj Ename Pname No_hours)
Update AnomalyChanging the name of project number P1
from ldquoBillingrdquo to ldquoCustomer-Accountingrdquo may cause this update to be made for all 100 employees working on project P1
Module 6 14042023
Module 6 15042023
Guideline to Redundant Information in Tuples and Update Anomalies GUIDELINE 2
Design a schema that does not suffer from the insertion deletion and update anomalies
If there are any anomalies present then note them so that applications can be made to take them into account
In general it is advisable to use anomaly free base relations and to specify views that include the joins for placing together the attributes frequently referenced in important queries
Module 6 16042023
Problems with Nulls If many attributes are grouped together
as a fat relation it gives rise to many nulls in the tuples
Waste storage Problems in understanding the
meaning of the attributes Difficult while using Nulls in aggregate
operators like count or sum
Module 6 17042023
3 Null Values in Tuples Interpretations of nulls
Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist but unavailable
GUIDELINE 3 Relations should be designed such that their
tuples will have as few NULL values as possible Attributes that are NULL frequently could be
placed in separate relations (with the primary key) Example-
if only 10 of employees have individual offices it is better not to include office_number as an attribute in the employee relation
Better create a new relation emp_offices(essn office_number)
Module 6 18042023
Example of Spurious Tuples
Module 6 19042023
Generation of spurious tuples The two relations EMP_PROJ1 and EMP_LOCS as
the base relations of EMP_PROJ is not a good schema design
Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ
These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid
This is because the PLOCATION attribute which is used for joining is neither a primary key nor a foreign key in either EMP_LOCS AND EMP_PROJ1
Module 6 20042023
Example of Spurious Tuples contd
Module 6 21042023
4 Spurious Tuples Bad designs for a relational database may result
in erroneous results for certain JOIN operations The lossless join property is used to
guarantee meaningful results for join operations
GUIDELINE 4 Design relation schemas so that they can be
joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated
Module 6 22042023
Spurious Tuples
There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies
Note that Property (a) is extremely important and cannot be
sacrificed Property (b) is less stringent and may be sacrificed
Module 6 23042023
Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done
during Insertion Modification Deletion
Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values
Generation of invalid and spurious data during joins on improperly related base relations
Module 6 24042023
Functional dependencies Functional dependencies (FDs)
Is a constraint between two sets of attributes from the database
Assumption The entire database is a single universal
relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes
Module 6 25042023
Definition
FDs are used to specify formal measures of the
goodness of relational designs keys that are used to define normal forms for
relations constraints that are derived from the meaning and
interrelationships of the data attributes A set of attributes X functionally determines
a set of attributes Y if the value of X determines a unique value for Y
Module 6 26042023
Functional Dependencies
A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If
t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r
depend on or are determined by the values of the X component
The values of the X component functionally determines the values of Y component
FDs are derived from the real-world constraints on the attributes
The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times
Module 6 27042023
Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760
Superior NA 31700 350
Victoria Africa 26828 250
Aral Sea Asia 24904 280
Huron NA 23000 206
Michigan NA 22300 307
Tanganyika Africa 12700 420
Continent -gtName Name -gtLength
Module 6 28042023
Graphical representation of Functional Dependencies
Module 6 29042023
Examples of FD constraints Social security number uniquely determines
employee name SSN -gt ENAME
Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION
Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS
Module 6 30042023
Examples of FD constraints A FD is a property of the attributes in the
schema R not of a particular legal relation state r of R
It must be defined explicitly by someone who knows the semantics of the attributes of R
The constraint must hold on every relation instance r(R)
If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with
t1[K]=t2[K])
Module 6 31042023
Satisfies algorithm
Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B
How it works Sort the tuples of the relation r on the A attributes so
that tuples with equal values under A are next to each other
Check that tuples with equal values under attributes A also have equal values under B
If it meets the condition 2 then the output of the algorithm is true else it is false
Module 6 32042023
Relation state of TEACH
TEACH
TEACHER COURSE TEXT
Teacher Course Text
Smith Data Structures
Bartram
Smith Data Management
Martin
Hall Compilers Hoffmann
Brown ooad Horowitz
TEACHER -gt COURSE
TEXT -gt COURSE
Module 6 33042023
Drawbacks of Satifies algorithm
Using this algorithm is tedious and time consuming
So inference axioms are used
Module 6 34042023
Inference Rules for Functional Dependencies
F is the set of functional dependencies that are specified on relation schema R
Schema designers specifies the most obvious FDs
The other dependencies can be inferred or deduced from FDs in F
Module 6 35042023
Example of Closure Department has one manager (DEPT_NO -gt
MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two
dependencies together imply that (DEPT_NO-gtMGR_PHONE)
This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F
The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)
Module 6 36042023
Example
F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are
SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME
To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y
Module 6 37042023
Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold
whenever the FDs in F hold Armstrongs inference rules
IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ
(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z
IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer
from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]
By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F
Module 6 38042023
Inference Rules for FDs Some additional inference rules that are useful
Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z
The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 11042023
EXAMPLE OF AN INSERT ANOMALY Consider the relation
EMP_PROJ(Emp Proj Ename Pname No_hours)
Insert Anomaly Cannot insert a project unless an employee is
assigned to it Conversely
Cannot insert an employee unless an heshe is assigned to a project
Module 6 12042023
EXAMPLE OF AN DELETE ANOMALY Consider the relation
EMP_PROJ(Emp Proj Ename Pname No_hours)
Delete Anomaly When a project is deleted it will result in deleting
all the employees who work on that project Alternately if an employee is the sole employee
on a project deleting that employee would result in deleting the corresponding project
Module 6 13042023
EXAMPLE OF AN UPDATE ANOMALY Consider the relation
EMP_PROJ(Emp Proj Ename Pname No_hours)
Update AnomalyChanging the name of project number P1
from ldquoBillingrdquo to ldquoCustomer-Accountingrdquo may cause this update to be made for all 100 employees working on project P1
Module 6 14042023
Module 6 15042023
Guideline to Redundant Information in Tuples and Update Anomalies GUIDELINE 2
Design a schema that does not suffer from the insertion deletion and update anomalies
If there are any anomalies present then note them so that applications can be made to take them into account
In general it is advisable to use anomaly free base relations and to specify views that include the joins for placing together the attributes frequently referenced in important queries
Module 6 16042023
Problems with Nulls If many attributes are grouped together
as a fat relation it gives rise to many nulls in the tuples
Waste storage Problems in understanding the
meaning of the attributes Difficult while using Nulls in aggregate
operators like count or sum
Module 6 17042023
3 Null Values in Tuples Interpretations of nulls
Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist but unavailable
GUIDELINE 3 Relations should be designed such that their
tuples will have as few NULL values as possible Attributes that are NULL frequently could be
placed in separate relations (with the primary key) Example-
if only 10 of employees have individual offices it is better not to include office_number as an attribute in the employee relation
Better create a new relation emp_offices(essn office_number)
Module 6 18042023
Example of Spurious Tuples
Module 6 19042023
Generation of spurious tuples The two relations EMP_PROJ1 and EMP_LOCS as
the base relations of EMP_PROJ is not a good schema design
Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ
These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid
This is because the PLOCATION attribute which is used for joining is neither a primary key nor a foreign key in either EMP_LOCS AND EMP_PROJ1
Module 6 20042023
Example of Spurious Tuples contd
Module 6 21042023
4 Spurious Tuples Bad designs for a relational database may result
in erroneous results for certain JOIN operations The lossless join property is used to
guarantee meaningful results for join operations
GUIDELINE 4 Design relation schemas so that they can be
joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated
Module 6 22042023
Spurious Tuples
There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies
Note that Property (a) is extremely important and cannot be
sacrificed Property (b) is less stringent and may be sacrificed
Module 6 23042023
Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done
during Insertion Modification Deletion
Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values
Generation of invalid and spurious data during joins on improperly related base relations
Module 6 24042023
Functional dependencies Functional dependencies (FDs)
Is a constraint between two sets of attributes from the database
Assumption The entire database is a single universal
relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes
Module 6 25042023
Definition
FDs are used to specify formal measures of the
goodness of relational designs keys that are used to define normal forms for
relations constraints that are derived from the meaning and
interrelationships of the data attributes A set of attributes X functionally determines
a set of attributes Y if the value of X determines a unique value for Y
Module 6 26042023
Functional Dependencies
A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If
t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r
depend on or are determined by the values of the X component
The values of the X component functionally determines the values of Y component
FDs are derived from the real-world constraints on the attributes
The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times
Module 6 27042023
Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760
Superior NA 31700 350
Victoria Africa 26828 250
Aral Sea Asia 24904 280
Huron NA 23000 206
Michigan NA 22300 307
Tanganyika Africa 12700 420
Continent -gtName Name -gtLength
Module 6 28042023
Graphical representation of Functional Dependencies
Module 6 29042023
Examples of FD constraints Social security number uniquely determines
employee name SSN -gt ENAME
Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION
Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS
Module 6 30042023
Examples of FD constraints A FD is a property of the attributes in the
schema R not of a particular legal relation state r of R
It must be defined explicitly by someone who knows the semantics of the attributes of R
The constraint must hold on every relation instance r(R)
If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with
t1[K]=t2[K])
Module 6 31042023
Satisfies algorithm
Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B
How it works Sort the tuples of the relation r on the A attributes so
that tuples with equal values under A are next to each other
Check that tuples with equal values under attributes A also have equal values under B
If it meets the condition 2 then the output of the algorithm is true else it is false
Module 6 32042023
Relation state of TEACH
TEACH
TEACHER COURSE TEXT
Teacher Course Text
Smith Data Structures
Bartram
Smith Data Management
Martin
Hall Compilers Hoffmann
Brown ooad Horowitz
TEACHER -gt COURSE
TEXT -gt COURSE
Module 6 33042023
Drawbacks of Satifies algorithm
Using this algorithm is tedious and time consuming
So inference axioms are used
Module 6 34042023
Inference Rules for Functional Dependencies
F is the set of functional dependencies that are specified on relation schema R
Schema designers specifies the most obvious FDs
The other dependencies can be inferred or deduced from FDs in F
Module 6 35042023
Example of Closure Department has one manager (DEPT_NO -gt
MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two
dependencies together imply that (DEPT_NO-gtMGR_PHONE)
This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F
The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)
Module 6 36042023
Example
F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are
SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME
To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y
Module 6 37042023
Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold
whenever the FDs in F hold Armstrongs inference rules
IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ
(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z
IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer
from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]
By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F
Module 6 38042023
Inference Rules for FDs Some additional inference rules that are useful
Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z
The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 12042023
EXAMPLE OF AN DELETE ANOMALY Consider the relation
EMP_PROJ(Emp Proj Ename Pname No_hours)
Delete Anomaly When a project is deleted it will result in deleting
all the employees who work on that project Alternately if an employee is the sole employee
on a project deleting that employee would result in deleting the corresponding project
Module 6 13042023
EXAMPLE OF AN UPDATE ANOMALY Consider the relation
EMP_PROJ(Emp Proj Ename Pname No_hours)
Update AnomalyChanging the name of project number P1
from ldquoBillingrdquo to ldquoCustomer-Accountingrdquo may cause this update to be made for all 100 employees working on project P1
Module 6 14042023
Module 6 15042023
Guideline to Redundant Information in Tuples and Update Anomalies GUIDELINE 2
Design a schema that does not suffer from the insertion deletion and update anomalies
If there are any anomalies present then note them so that applications can be made to take them into account
In general it is advisable to use anomaly free base relations and to specify views that include the joins for placing together the attributes frequently referenced in important queries
Module 6 16042023
Problems with Nulls If many attributes are grouped together
as a fat relation it gives rise to many nulls in the tuples
Waste storage Problems in understanding the
meaning of the attributes Difficult while using Nulls in aggregate
operators like count or sum
Module 6 17042023
3 Null Values in Tuples Interpretations of nulls
Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist but unavailable
GUIDELINE 3 Relations should be designed such that their
tuples will have as few NULL values as possible Attributes that are NULL frequently could be
placed in separate relations (with the primary key) Example-
if only 10 of employees have individual offices it is better not to include office_number as an attribute in the employee relation
Better create a new relation emp_offices(essn office_number)
Module 6 18042023
Example of Spurious Tuples
Module 6 19042023
Generation of spurious tuples The two relations EMP_PROJ1 and EMP_LOCS as
the base relations of EMP_PROJ is not a good schema design
Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ
These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid
This is because the PLOCATION attribute which is used for joining is neither a primary key nor a foreign key in either EMP_LOCS AND EMP_PROJ1
Module 6 20042023
Example of Spurious Tuples contd
Module 6 21042023
4 Spurious Tuples Bad designs for a relational database may result
in erroneous results for certain JOIN operations The lossless join property is used to
guarantee meaningful results for join operations
GUIDELINE 4 Design relation schemas so that they can be
joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated
Module 6 22042023
Spurious Tuples
There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies
Note that Property (a) is extremely important and cannot be
sacrificed Property (b) is less stringent and may be sacrificed
Module 6 23042023
Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done
during Insertion Modification Deletion
Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values
Generation of invalid and spurious data during joins on improperly related base relations
Module 6 24042023
Functional dependencies Functional dependencies (FDs)
Is a constraint between two sets of attributes from the database
Assumption The entire database is a single universal
relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes
Module 6 25042023
Definition
FDs are used to specify formal measures of the
goodness of relational designs keys that are used to define normal forms for
relations constraints that are derived from the meaning and
interrelationships of the data attributes A set of attributes X functionally determines
a set of attributes Y if the value of X determines a unique value for Y
Module 6 26042023
Functional Dependencies
A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If
t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r
depend on or are determined by the values of the X component
The values of the X component functionally determines the values of Y component
FDs are derived from the real-world constraints on the attributes
The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times
Module 6 27042023
Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760
Superior NA 31700 350
Victoria Africa 26828 250
Aral Sea Asia 24904 280
Huron NA 23000 206
Michigan NA 22300 307
Tanganyika Africa 12700 420
Continent -gtName Name -gtLength
Module 6 28042023
Graphical representation of Functional Dependencies
Module 6 29042023
Examples of FD constraints Social security number uniquely determines
employee name SSN -gt ENAME
Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION
Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS
Module 6 30042023
Examples of FD constraints A FD is a property of the attributes in the
schema R not of a particular legal relation state r of R
It must be defined explicitly by someone who knows the semantics of the attributes of R
The constraint must hold on every relation instance r(R)
If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with
t1[K]=t2[K])
Module 6 31042023
Satisfies algorithm
Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B
How it works Sort the tuples of the relation r on the A attributes so
that tuples with equal values under A are next to each other
Check that tuples with equal values under attributes A also have equal values under B
If it meets the condition 2 then the output of the algorithm is true else it is false
Module 6 32042023
Relation state of TEACH
TEACH
TEACHER COURSE TEXT
Teacher Course Text
Smith Data Structures
Bartram
Smith Data Management
Martin
Hall Compilers Hoffmann
Brown ooad Horowitz
TEACHER -gt COURSE
TEXT -gt COURSE
Module 6 33042023
Drawbacks of Satifies algorithm
Using this algorithm is tedious and time consuming
So inference axioms are used
Module 6 34042023
Inference Rules for Functional Dependencies
F is the set of functional dependencies that are specified on relation schema R
Schema designers specifies the most obvious FDs
The other dependencies can be inferred or deduced from FDs in F
Module 6 35042023
Example of Closure Department has one manager (DEPT_NO -gt
MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two
dependencies together imply that (DEPT_NO-gtMGR_PHONE)
This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F
The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)
Module 6 36042023
Example
F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are
SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME
To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y
Module 6 37042023
Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold
whenever the FDs in F hold Armstrongs inference rules
IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ
(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z
IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer
from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]
By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F
Module 6 38042023
Inference Rules for FDs Some additional inference rules that are useful
Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z
The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 13042023
EXAMPLE OF AN UPDATE ANOMALY Consider the relation
EMP_PROJ(Emp Proj Ename Pname No_hours)
Update AnomalyChanging the name of project number P1
from ldquoBillingrdquo to ldquoCustomer-Accountingrdquo may cause this update to be made for all 100 employees working on project P1
Module 6 14042023
Module 6 15042023
Guideline to Redundant Information in Tuples and Update Anomalies GUIDELINE 2
Design a schema that does not suffer from the insertion deletion and update anomalies
If there are any anomalies present then note them so that applications can be made to take them into account
In general it is advisable to use anomaly free base relations and to specify views that include the joins for placing together the attributes frequently referenced in important queries
Module 6 16042023
Problems with Nulls If many attributes are grouped together
as a fat relation it gives rise to many nulls in the tuples
Waste storage Problems in understanding the
meaning of the attributes Difficult while using Nulls in aggregate
operators like count or sum
Module 6 17042023
3 Null Values in Tuples Interpretations of nulls
Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist but unavailable
GUIDELINE 3 Relations should be designed such that their
tuples will have as few NULL values as possible Attributes that are NULL frequently could be
placed in separate relations (with the primary key) Example-
if only 10 of employees have individual offices it is better not to include office_number as an attribute in the employee relation
Better create a new relation emp_offices(essn office_number)
Module 6 18042023
Example of Spurious Tuples
Module 6 19042023
Generation of spurious tuples The two relations EMP_PROJ1 and EMP_LOCS as
the base relations of EMP_PROJ is not a good schema design
Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ
These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid
This is because the PLOCATION attribute which is used for joining is neither a primary key nor a foreign key in either EMP_LOCS AND EMP_PROJ1
Module 6 20042023
Example of Spurious Tuples contd
Module 6 21042023
4 Spurious Tuples Bad designs for a relational database may result
in erroneous results for certain JOIN operations The lossless join property is used to
guarantee meaningful results for join operations
GUIDELINE 4 Design relation schemas so that they can be
joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated
Module 6 22042023
Spurious Tuples
There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies
Note that Property (a) is extremely important and cannot be
sacrificed Property (b) is less stringent and may be sacrificed
Module 6 23042023
Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done
during Insertion Modification Deletion
Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values
Generation of invalid and spurious data during joins on improperly related base relations
Module 6 24042023
Functional dependencies Functional dependencies (FDs)
Is a constraint between two sets of attributes from the database
Assumption The entire database is a single universal
relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes
Module 6 25042023
Definition
FDs are used to specify formal measures of the
goodness of relational designs keys that are used to define normal forms for
relations constraints that are derived from the meaning and
interrelationships of the data attributes A set of attributes X functionally determines
a set of attributes Y if the value of X determines a unique value for Y
Module 6 26042023
Functional Dependencies
A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If
t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r
depend on or are determined by the values of the X component
The values of the X component functionally determines the values of Y component
FDs are derived from the real-world constraints on the attributes
The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times
Module 6 27042023
Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760
Superior NA 31700 350
Victoria Africa 26828 250
Aral Sea Asia 24904 280
Huron NA 23000 206
Michigan NA 22300 307
Tanganyika Africa 12700 420
Continent -gtName Name -gtLength
Module 6 28042023
Graphical representation of Functional Dependencies
Module 6 29042023
Examples of FD constraints Social security number uniquely determines
employee name SSN -gt ENAME
Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION
Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS
Module 6 30042023
Examples of FD constraints A FD is a property of the attributes in the
schema R not of a particular legal relation state r of R
It must be defined explicitly by someone who knows the semantics of the attributes of R
The constraint must hold on every relation instance r(R)
If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with
t1[K]=t2[K])
Module 6 31042023
Satisfies algorithm
Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B
How it works Sort the tuples of the relation r on the A attributes so
that tuples with equal values under A are next to each other
Check that tuples with equal values under attributes A also have equal values under B
If it meets the condition 2 then the output of the algorithm is true else it is false
Module 6 32042023
Relation state of TEACH
TEACH
TEACHER COURSE TEXT
Teacher Course Text
Smith Data Structures
Bartram
Smith Data Management
Martin
Hall Compilers Hoffmann
Brown ooad Horowitz
TEACHER -gt COURSE
TEXT -gt COURSE
Module 6 33042023
Drawbacks of Satifies algorithm
Using this algorithm is tedious and time consuming
So inference axioms are used
Module 6 34042023
Inference Rules for Functional Dependencies
F is the set of functional dependencies that are specified on relation schema R
Schema designers specifies the most obvious FDs
The other dependencies can be inferred or deduced from FDs in F
Module 6 35042023
Example of Closure Department has one manager (DEPT_NO -gt
MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two
dependencies together imply that (DEPT_NO-gtMGR_PHONE)
This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F
The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)
Module 6 36042023
Example
F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are
SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME
To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y
Module 6 37042023
Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold
whenever the FDs in F hold Armstrongs inference rules
IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ
(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z
IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer
from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]
By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F
Module 6 38042023
Inference Rules for FDs Some additional inference rules that are useful
Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z
The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 14042023
Module 6 15042023
Guideline to Redundant Information in Tuples and Update Anomalies GUIDELINE 2
Design a schema that does not suffer from the insertion deletion and update anomalies
If there are any anomalies present then note them so that applications can be made to take them into account
In general it is advisable to use anomaly free base relations and to specify views that include the joins for placing together the attributes frequently referenced in important queries
Module 6 16042023
Problems with Nulls If many attributes are grouped together
as a fat relation it gives rise to many nulls in the tuples
Waste storage Problems in understanding the
meaning of the attributes Difficult while using Nulls in aggregate
operators like count or sum
Module 6 17042023
3 Null Values in Tuples Interpretations of nulls
Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist but unavailable
GUIDELINE 3 Relations should be designed such that their
tuples will have as few NULL values as possible Attributes that are NULL frequently could be
placed in separate relations (with the primary key) Example-
if only 10 of employees have individual offices it is better not to include office_number as an attribute in the employee relation
Better create a new relation emp_offices(essn office_number)
Module 6 18042023
Example of Spurious Tuples
Module 6 19042023
Generation of spurious tuples The two relations EMP_PROJ1 and EMP_LOCS as
the base relations of EMP_PROJ is not a good schema design
Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ
These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid
This is because the PLOCATION attribute which is used for joining is neither a primary key nor a foreign key in either EMP_LOCS AND EMP_PROJ1
Module 6 20042023
Example of Spurious Tuples contd
Module 6 21042023
4 Spurious Tuples Bad designs for a relational database may result
in erroneous results for certain JOIN operations The lossless join property is used to
guarantee meaningful results for join operations
GUIDELINE 4 Design relation schemas so that they can be
joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated
Module 6 22042023
Spurious Tuples
There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies
Note that Property (a) is extremely important and cannot be
sacrificed Property (b) is less stringent and may be sacrificed
Module 6 23042023
Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done
during Insertion Modification Deletion
Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values
Generation of invalid and spurious data during joins on improperly related base relations
Module 6 24042023
Functional dependencies Functional dependencies (FDs)
Is a constraint between two sets of attributes from the database
Assumption The entire database is a single universal
relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes
Module 6 25042023
Definition
FDs are used to specify formal measures of the
goodness of relational designs keys that are used to define normal forms for
relations constraints that are derived from the meaning and
interrelationships of the data attributes A set of attributes X functionally determines
a set of attributes Y if the value of X determines a unique value for Y
Module 6 26042023
Functional Dependencies
A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If
t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r
depend on or are determined by the values of the X component
The values of the X component functionally determines the values of Y component
FDs are derived from the real-world constraints on the attributes
The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times
Module 6 27042023
Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760
Superior NA 31700 350
Victoria Africa 26828 250
Aral Sea Asia 24904 280
Huron NA 23000 206
Michigan NA 22300 307
Tanganyika Africa 12700 420
Continent -gtName Name -gtLength
Module 6 28042023
Graphical representation of Functional Dependencies
Module 6 29042023
Examples of FD constraints Social security number uniquely determines
employee name SSN -gt ENAME
Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION
Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS
Module 6 30042023
Examples of FD constraints A FD is a property of the attributes in the
schema R not of a particular legal relation state r of R
It must be defined explicitly by someone who knows the semantics of the attributes of R
The constraint must hold on every relation instance r(R)
If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with
t1[K]=t2[K])
Module 6 31042023
Satisfies algorithm
Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B
How it works Sort the tuples of the relation r on the A attributes so
that tuples with equal values under A are next to each other
Check that tuples with equal values under attributes A also have equal values under B
If it meets the condition 2 then the output of the algorithm is true else it is false
Module 6 32042023
Relation state of TEACH
TEACH
TEACHER COURSE TEXT
Teacher Course Text
Smith Data Structures
Bartram
Smith Data Management
Martin
Hall Compilers Hoffmann
Brown ooad Horowitz
TEACHER -gt COURSE
TEXT -gt COURSE
Module 6 33042023
Drawbacks of Satifies algorithm
Using this algorithm is tedious and time consuming
So inference axioms are used
Module 6 34042023
Inference Rules for Functional Dependencies
F is the set of functional dependencies that are specified on relation schema R
Schema designers specifies the most obvious FDs
The other dependencies can be inferred or deduced from FDs in F
Module 6 35042023
Example of Closure Department has one manager (DEPT_NO -gt
MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two
dependencies together imply that (DEPT_NO-gtMGR_PHONE)
This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F
The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)
Module 6 36042023
Example
F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are
SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME
To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y
Module 6 37042023
Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold
whenever the FDs in F hold Armstrongs inference rules
IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ
(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z
IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer
from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]
By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F
Module 6 38042023
Inference Rules for FDs Some additional inference rules that are useful
Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z
The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 15042023
Guideline to Redundant Information in Tuples and Update Anomalies GUIDELINE 2
Design a schema that does not suffer from the insertion deletion and update anomalies
If there are any anomalies present then note them so that applications can be made to take them into account
In general it is advisable to use anomaly free base relations and to specify views that include the joins for placing together the attributes frequently referenced in important queries
Module 6 16042023
Problems with Nulls If many attributes are grouped together
as a fat relation it gives rise to many nulls in the tuples
Waste storage Problems in understanding the
meaning of the attributes Difficult while using Nulls in aggregate
operators like count or sum
Module 6 17042023
3 Null Values in Tuples Interpretations of nulls
Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist but unavailable
GUIDELINE 3 Relations should be designed such that their
tuples will have as few NULL values as possible Attributes that are NULL frequently could be
placed in separate relations (with the primary key) Example-
if only 10 of employees have individual offices it is better not to include office_number as an attribute in the employee relation
Better create a new relation emp_offices(essn office_number)
Module 6 18042023
Example of Spurious Tuples
Module 6 19042023
Generation of spurious tuples The two relations EMP_PROJ1 and EMP_LOCS as
the base relations of EMP_PROJ is not a good schema design
Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ
These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid
This is because the PLOCATION attribute which is used for joining is neither a primary key nor a foreign key in either EMP_LOCS AND EMP_PROJ1
Module 6 20042023
Example of Spurious Tuples contd
Module 6 21042023
4 Spurious Tuples Bad designs for a relational database may result
in erroneous results for certain JOIN operations The lossless join property is used to
guarantee meaningful results for join operations
GUIDELINE 4 Design relation schemas so that they can be
joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated
Module 6 22042023
Spurious Tuples
There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies
Note that Property (a) is extremely important and cannot be
sacrificed Property (b) is less stringent and may be sacrificed
Module 6 23042023
Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done
during Insertion Modification Deletion
Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values
Generation of invalid and spurious data during joins on improperly related base relations
Module 6 24042023
Functional dependencies Functional dependencies (FDs)
Is a constraint between two sets of attributes from the database
Assumption The entire database is a single universal
relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes
Module 6 25042023
Definition
FDs are used to specify formal measures of the
goodness of relational designs keys that are used to define normal forms for
relations constraints that are derived from the meaning and
interrelationships of the data attributes A set of attributes X functionally determines
a set of attributes Y if the value of X determines a unique value for Y
Module 6 26042023
Functional Dependencies
A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If
t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r
depend on or are determined by the values of the X component
The values of the X component functionally determines the values of Y component
FDs are derived from the real-world constraints on the attributes
The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times
Module 6 27042023
Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760
Superior NA 31700 350
Victoria Africa 26828 250
Aral Sea Asia 24904 280
Huron NA 23000 206
Michigan NA 22300 307
Tanganyika Africa 12700 420
Continent -gtName Name -gtLength
Module 6 28042023
Graphical representation of Functional Dependencies
Module 6 29042023
Examples of FD constraints Social security number uniquely determines
employee name SSN -gt ENAME
Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION
Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS
Module 6 30042023
Examples of FD constraints A FD is a property of the attributes in the
schema R not of a particular legal relation state r of R
It must be defined explicitly by someone who knows the semantics of the attributes of R
The constraint must hold on every relation instance r(R)
If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with
t1[K]=t2[K])
Module 6 31042023
Satisfies algorithm
Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B
How it works Sort the tuples of the relation r on the A attributes so
that tuples with equal values under A are next to each other
Check that tuples with equal values under attributes A also have equal values under B
If it meets the condition 2 then the output of the algorithm is true else it is false
Module 6 32042023
Relation state of TEACH
TEACH
TEACHER COURSE TEXT
Teacher Course Text
Smith Data Structures
Bartram
Smith Data Management
Martin
Hall Compilers Hoffmann
Brown ooad Horowitz
TEACHER -gt COURSE
TEXT -gt COURSE
Module 6 33042023
Drawbacks of Satifies algorithm
Using this algorithm is tedious and time consuming
So inference axioms are used
Module 6 34042023
Inference Rules for Functional Dependencies
F is the set of functional dependencies that are specified on relation schema R
Schema designers specifies the most obvious FDs
The other dependencies can be inferred or deduced from FDs in F
Module 6 35042023
Example of Closure Department has one manager (DEPT_NO -gt
MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two
dependencies together imply that (DEPT_NO-gtMGR_PHONE)
This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F
The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)
Module 6 36042023
Example
F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are
SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME
To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y
Module 6 37042023
Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold
whenever the FDs in F hold Armstrongs inference rules
IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ
(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z
IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer
from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]
By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F
Module 6 38042023
Inference Rules for FDs Some additional inference rules that are useful
Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z
The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 16042023
Problems with Nulls If many attributes are grouped together
as a fat relation it gives rise to many nulls in the tuples
Waste storage Problems in understanding the
meaning of the attributes Difficult while using Nulls in aggregate
operators like count or sum
Module 6 17042023
3 Null Values in Tuples Interpretations of nulls
Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist but unavailable
GUIDELINE 3 Relations should be designed such that their
tuples will have as few NULL values as possible Attributes that are NULL frequently could be
placed in separate relations (with the primary key) Example-
if only 10 of employees have individual offices it is better not to include office_number as an attribute in the employee relation
Better create a new relation emp_offices(essn office_number)
Module 6 18042023
Example of Spurious Tuples
Module 6 19042023
Generation of spurious tuples The two relations EMP_PROJ1 and EMP_LOCS as
the base relations of EMP_PROJ is not a good schema design
Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ
These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid
This is because the PLOCATION attribute which is used for joining is neither a primary key nor a foreign key in either EMP_LOCS AND EMP_PROJ1
Module 6 20042023
Example of Spurious Tuples contd
Module 6 21042023
4 Spurious Tuples Bad designs for a relational database may result
in erroneous results for certain JOIN operations The lossless join property is used to
guarantee meaningful results for join operations
GUIDELINE 4 Design relation schemas so that they can be
joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated
Module 6 22042023
Spurious Tuples
There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies
Note that Property (a) is extremely important and cannot be
sacrificed Property (b) is less stringent and may be sacrificed
Module 6 23042023
Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done
during Insertion Modification Deletion
Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values
Generation of invalid and spurious data during joins on improperly related base relations
Module 6 24042023
Functional dependencies Functional dependencies (FDs)
Is a constraint between two sets of attributes from the database
Assumption The entire database is a single universal
relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes
Module 6 25042023
Definition
FDs are used to specify formal measures of the
goodness of relational designs keys that are used to define normal forms for
relations constraints that are derived from the meaning and
interrelationships of the data attributes A set of attributes X functionally determines
a set of attributes Y if the value of X determines a unique value for Y
Module 6 26042023
Functional Dependencies
A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If
t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r
depend on or are determined by the values of the X component
The values of the X component functionally determines the values of Y component
FDs are derived from the real-world constraints on the attributes
The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times
Module 6 27042023
Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760
Superior NA 31700 350
Victoria Africa 26828 250
Aral Sea Asia 24904 280
Huron NA 23000 206
Michigan NA 22300 307
Tanganyika Africa 12700 420
Continent -gtName Name -gtLength
Module 6 28042023
Graphical representation of Functional Dependencies
Module 6 29042023
Examples of FD constraints Social security number uniquely determines
employee name SSN -gt ENAME
Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION
Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS
Module 6 30042023
Examples of FD constraints A FD is a property of the attributes in the
schema R not of a particular legal relation state r of R
It must be defined explicitly by someone who knows the semantics of the attributes of R
The constraint must hold on every relation instance r(R)
If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with
t1[K]=t2[K])
Module 6 31042023
Satisfies algorithm
Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B
How it works Sort the tuples of the relation r on the A attributes so
that tuples with equal values under A are next to each other
Check that tuples with equal values under attributes A also have equal values under B
If it meets the condition 2 then the output of the algorithm is true else it is false
Module 6 32042023
Relation state of TEACH
TEACH
TEACHER COURSE TEXT
Teacher Course Text
Smith Data Structures
Bartram
Smith Data Management
Martin
Hall Compilers Hoffmann
Brown ooad Horowitz
TEACHER -gt COURSE
TEXT -gt COURSE
Module 6 33042023
Drawbacks of Satifies algorithm
Using this algorithm is tedious and time consuming
So inference axioms are used
Module 6 34042023
Inference Rules for Functional Dependencies
F is the set of functional dependencies that are specified on relation schema R
Schema designers specifies the most obvious FDs
The other dependencies can be inferred or deduced from FDs in F
Module 6 35042023
Example of Closure Department has one manager (DEPT_NO -gt
MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two
dependencies together imply that (DEPT_NO-gtMGR_PHONE)
This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F
The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)
Module 6 36042023
Example
F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are
SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME
To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y
Module 6 37042023
Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold
whenever the FDs in F hold Armstrongs inference rules
IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ
(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z
IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer
from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]
By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F
Module 6 38042023
Inference Rules for FDs Some additional inference rules that are useful
Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z
The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 17042023
3 Null Values in Tuples Interpretations of nulls
Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist but unavailable
GUIDELINE 3 Relations should be designed such that their
tuples will have as few NULL values as possible Attributes that are NULL frequently could be
placed in separate relations (with the primary key) Example-
if only 10 of employees have individual offices it is better not to include office_number as an attribute in the employee relation
Better create a new relation emp_offices(essn office_number)
Module 6 18042023
Example of Spurious Tuples
Module 6 19042023
Generation of spurious tuples The two relations EMP_PROJ1 and EMP_LOCS as
the base relations of EMP_PROJ is not a good schema design
Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ
These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid
This is because the PLOCATION attribute which is used for joining is neither a primary key nor a foreign key in either EMP_LOCS AND EMP_PROJ1
Module 6 20042023
Example of Spurious Tuples contd
Module 6 21042023
4 Spurious Tuples Bad designs for a relational database may result
in erroneous results for certain JOIN operations The lossless join property is used to
guarantee meaningful results for join operations
GUIDELINE 4 Design relation schemas so that they can be
joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated
Module 6 22042023
Spurious Tuples
There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies
Note that Property (a) is extremely important and cannot be
sacrificed Property (b) is less stringent and may be sacrificed
Module 6 23042023
Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done
during Insertion Modification Deletion
Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values
Generation of invalid and spurious data during joins on improperly related base relations
Module 6 24042023
Functional dependencies Functional dependencies (FDs)
Is a constraint between two sets of attributes from the database
Assumption The entire database is a single universal
relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes
Module 6 25042023
Definition
FDs are used to specify formal measures of the
goodness of relational designs keys that are used to define normal forms for
relations constraints that are derived from the meaning and
interrelationships of the data attributes A set of attributes X functionally determines
a set of attributes Y if the value of X determines a unique value for Y
Module 6 26042023
Functional Dependencies
A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If
t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r
depend on or are determined by the values of the X component
The values of the X component functionally determines the values of Y component
FDs are derived from the real-world constraints on the attributes
The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times
Module 6 27042023
Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760
Superior NA 31700 350
Victoria Africa 26828 250
Aral Sea Asia 24904 280
Huron NA 23000 206
Michigan NA 22300 307
Tanganyika Africa 12700 420
Continent -gtName Name -gtLength
Module 6 28042023
Graphical representation of Functional Dependencies
Module 6 29042023
Examples of FD constraints Social security number uniquely determines
employee name SSN -gt ENAME
Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION
Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS
Module 6 30042023
Examples of FD constraints A FD is a property of the attributes in the
schema R not of a particular legal relation state r of R
It must be defined explicitly by someone who knows the semantics of the attributes of R
The constraint must hold on every relation instance r(R)
If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with
t1[K]=t2[K])
Module 6 31042023
Satisfies algorithm
Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B
How it works Sort the tuples of the relation r on the A attributes so
that tuples with equal values under A are next to each other
Check that tuples with equal values under attributes A also have equal values under B
If it meets the condition 2 then the output of the algorithm is true else it is false
Module 6 32042023
Relation state of TEACH
TEACH
TEACHER COURSE TEXT
Teacher Course Text
Smith Data Structures
Bartram
Smith Data Management
Martin
Hall Compilers Hoffmann
Brown ooad Horowitz
TEACHER -gt COURSE
TEXT -gt COURSE
Module 6 33042023
Drawbacks of Satifies algorithm
Using this algorithm is tedious and time consuming
So inference axioms are used
Module 6 34042023
Inference Rules for Functional Dependencies
F is the set of functional dependencies that are specified on relation schema R
Schema designers specifies the most obvious FDs
The other dependencies can be inferred or deduced from FDs in F
Module 6 35042023
Example of Closure Department has one manager (DEPT_NO -gt
MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two
dependencies together imply that (DEPT_NO-gtMGR_PHONE)
This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F
The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)
Module 6 36042023
Example
F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are
SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME
To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y
Module 6 37042023
Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold
whenever the FDs in F hold Armstrongs inference rules
IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ
(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z
IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer
from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]
By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F
Module 6 38042023
Inference Rules for FDs Some additional inference rules that are useful
Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z
The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 18042023
Example of Spurious Tuples
Module 6 19042023
Generation of spurious tuples The two relations EMP_PROJ1 and EMP_LOCS as
the base relations of EMP_PROJ is not a good schema design
Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ
These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid
This is because the PLOCATION attribute which is used for joining is neither a primary key nor a foreign key in either EMP_LOCS AND EMP_PROJ1
Module 6 20042023
Example of Spurious Tuples contd
Module 6 21042023
4 Spurious Tuples Bad designs for a relational database may result
in erroneous results for certain JOIN operations The lossless join property is used to
guarantee meaningful results for join operations
GUIDELINE 4 Design relation schemas so that they can be
joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated
Module 6 22042023
Spurious Tuples
There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies
Note that Property (a) is extremely important and cannot be
sacrificed Property (b) is less stringent and may be sacrificed
Module 6 23042023
Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done
during Insertion Modification Deletion
Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values
Generation of invalid and spurious data during joins on improperly related base relations
Module 6 24042023
Functional dependencies Functional dependencies (FDs)
Is a constraint between two sets of attributes from the database
Assumption The entire database is a single universal
relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes
Module 6 25042023
Definition
FDs are used to specify formal measures of the
goodness of relational designs keys that are used to define normal forms for
relations constraints that are derived from the meaning and
interrelationships of the data attributes A set of attributes X functionally determines
a set of attributes Y if the value of X determines a unique value for Y
Module 6 26042023
Functional Dependencies
A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If
t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r
depend on or are determined by the values of the X component
The values of the X component functionally determines the values of Y component
FDs are derived from the real-world constraints on the attributes
The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times
Module 6 27042023
Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760
Superior NA 31700 350
Victoria Africa 26828 250
Aral Sea Asia 24904 280
Huron NA 23000 206
Michigan NA 22300 307
Tanganyika Africa 12700 420
Continent -gtName Name -gtLength
Module 6 28042023
Graphical representation of Functional Dependencies
Module 6 29042023
Examples of FD constraints Social security number uniquely determines
employee name SSN -gt ENAME
Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION
Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS
Module 6 30042023
Examples of FD constraints A FD is a property of the attributes in the
schema R not of a particular legal relation state r of R
It must be defined explicitly by someone who knows the semantics of the attributes of R
The constraint must hold on every relation instance r(R)
If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with
t1[K]=t2[K])
Module 6 31042023
Satisfies algorithm
Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B
How it works Sort the tuples of the relation r on the A attributes so
that tuples with equal values under A are next to each other
Check that tuples with equal values under attributes A also have equal values under B
If it meets the condition 2 then the output of the algorithm is true else it is false
Module 6 32042023
Relation state of TEACH
TEACH
TEACHER COURSE TEXT
Teacher Course Text
Smith Data Structures
Bartram
Smith Data Management
Martin
Hall Compilers Hoffmann
Brown ooad Horowitz
TEACHER -gt COURSE
TEXT -gt COURSE
Module 6 33042023
Drawbacks of Satifies algorithm
Using this algorithm is tedious and time consuming
So inference axioms are used
Module 6 34042023
Inference Rules for Functional Dependencies
F is the set of functional dependencies that are specified on relation schema R
Schema designers specifies the most obvious FDs
The other dependencies can be inferred or deduced from FDs in F
Module 6 35042023
Example of Closure Department has one manager (DEPT_NO -gt
MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two
dependencies together imply that (DEPT_NO-gtMGR_PHONE)
This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F
The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)
Module 6 36042023
Example
F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are
SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME
To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y
Module 6 37042023
Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold
whenever the FDs in F hold Armstrongs inference rules
IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ
(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z
IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer
from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]
By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F
Module 6 38042023
Inference Rules for FDs Some additional inference rules that are useful
Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z
The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 19042023
Generation of spurious tuples The two relations EMP_PROJ1 and EMP_LOCS as
the base relations of EMP_PROJ is not a good schema design
Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ
These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid
This is because the PLOCATION attribute which is used for joining is neither a primary key nor a foreign key in either EMP_LOCS AND EMP_PROJ1
Module 6 20042023
Example of Spurious Tuples contd
Module 6 21042023
4 Spurious Tuples Bad designs for a relational database may result
in erroneous results for certain JOIN operations The lossless join property is used to
guarantee meaningful results for join operations
GUIDELINE 4 Design relation schemas so that they can be
joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated
Module 6 22042023
Spurious Tuples
There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies
Note that Property (a) is extremely important and cannot be
sacrificed Property (b) is less stringent and may be sacrificed
Module 6 23042023
Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done
during Insertion Modification Deletion
Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values
Generation of invalid and spurious data during joins on improperly related base relations
Module 6 24042023
Functional dependencies Functional dependencies (FDs)
Is a constraint between two sets of attributes from the database
Assumption The entire database is a single universal
relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes
Module 6 25042023
Definition
FDs are used to specify formal measures of the
goodness of relational designs keys that are used to define normal forms for
relations constraints that are derived from the meaning and
interrelationships of the data attributes A set of attributes X functionally determines
a set of attributes Y if the value of X determines a unique value for Y
Module 6 26042023
Functional Dependencies
A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If
t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r
depend on or are determined by the values of the X component
The values of the X component functionally determines the values of Y component
FDs are derived from the real-world constraints on the attributes
The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times
Module 6 27042023
Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760
Superior NA 31700 350
Victoria Africa 26828 250
Aral Sea Asia 24904 280
Huron NA 23000 206
Michigan NA 22300 307
Tanganyika Africa 12700 420
Continent -gtName Name -gtLength
Module 6 28042023
Graphical representation of Functional Dependencies
Module 6 29042023
Examples of FD constraints Social security number uniquely determines
employee name SSN -gt ENAME
Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION
Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS
Module 6 30042023
Examples of FD constraints A FD is a property of the attributes in the
schema R not of a particular legal relation state r of R
It must be defined explicitly by someone who knows the semantics of the attributes of R
The constraint must hold on every relation instance r(R)
If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with
t1[K]=t2[K])
Module 6 31042023
Satisfies algorithm
Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B
How it works Sort the tuples of the relation r on the A attributes so
that tuples with equal values under A are next to each other
Check that tuples with equal values under attributes A also have equal values under B
If it meets the condition 2 then the output of the algorithm is true else it is false
Module 6 32042023
Relation state of TEACH
TEACH
TEACHER COURSE TEXT
Teacher Course Text
Smith Data Structures
Bartram
Smith Data Management
Martin
Hall Compilers Hoffmann
Brown ooad Horowitz
TEACHER -gt COURSE
TEXT -gt COURSE
Module 6 33042023
Drawbacks of Satifies algorithm
Using this algorithm is tedious and time consuming
So inference axioms are used
Module 6 34042023
Inference Rules for Functional Dependencies
F is the set of functional dependencies that are specified on relation schema R
Schema designers specifies the most obvious FDs
The other dependencies can be inferred or deduced from FDs in F
Module 6 35042023
Example of Closure Department has one manager (DEPT_NO -gt
MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two
dependencies together imply that (DEPT_NO-gtMGR_PHONE)
This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F
The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)
Module 6 36042023
Example
F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are
SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME
To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y
Module 6 37042023
Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold
whenever the FDs in F hold Armstrongs inference rules
IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ
(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z
IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer
from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]
By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F
Module 6 38042023
Inference Rules for FDs Some additional inference rules that are useful
Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z
The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 20042023
Example of Spurious Tuples contd
Module 6 21042023
4 Spurious Tuples Bad designs for a relational database may result
in erroneous results for certain JOIN operations The lossless join property is used to
guarantee meaningful results for join operations
GUIDELINE 4 Design relation schemas so that they can be
joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated
Module 6 22042023
Spurious Tuples
There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies
Note that Property (a) is extremely important and cannot be
sacrificed Property (b) is less stringent and may be sacrificed
Module 6 23042023
Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done
during Insertion Modification Deletion
Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values
Generation of invalid and spurious data during joins on improperly related base relations
Module 6 24042023
Functional dependencies Functional dependencies (FDs)
Is a constraint between two sets of attributes from the database
Assumption The entire database is a single universal
relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes
Module 6 25042023
Definition
FDs are used to specify formal measures of the
goodness of relational designs keys that are used to define normal forms for
relations constraints that are derived from the meaning and
interrelationships of the data attributes A set of attributes X functionally determines
a set of attributes Y if the value of X determines a unique value for Y
Module 6 26042023
Functional Dependencies
A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If
t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r
depend on or are determined by the values of the X component
The values of the X component functionally determines the values of Y component
FDs are derived from the real-world constraints on the attributes
The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times
Module 6 27042023
Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760
Superior NA 31700 350
Victoria Africa 26828 250
Aral Sea Asia 24904 280
Huron NA 23000 206
Michigan NA 22300 307
Tanganyika Africa 12700 420
Continent -gtName Name -gtLength
Module 6 28042023
Graphical representation of Functional Dependencies
Module 6 29042023
Examples of FD constraints Social security number uniquely determines
employee name SSN -gt ENAME
Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION
Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS
Module 6 30042023
Examples of FD constraints A FD is a property of the attributes in the
schema R not of a particular legal relation state r of R
It must be defined explicitly by someone who knows the semantics of the attributes of R
The constraint must hold on every relation instance r(R)
If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with
t1[K]=t2[K])
Module 6 31042023
Satisfies algorithm
Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B
How it works Sort the tuples of the relation r on the A attributes so
that tuples with equal values under A are next to each other
Check that tuples with equal values under attributes A also have equal values under B
If it meets the condition 2 then the output of the algorithm is true else it is false
Module 6 32042023
Relation state of TEACH
TEACH
TEACHER COURSE TEXT
Teacher Course Text
Smith Data Structures
Bartram
Smith Data Management
Martin
Hall Compilers Hoffmann
Brown ooad Horowitz
TEACHER -gt COURSE
TEXT -gt COURSE
Module 6 33042023
Drawbacks of Satifies algorithm
Using this algorithm is tedious and time consuming
So inference axioms are used
Module 6 34042023
Inference Rules for Functional Dependencies
F is the set of functional dependencies that are specified on relation schema R
Schema designers specifies the most obvious FDs
The other dependencies can be inferred or deduced from FDs in F
Module 6 35042023
Example of Closure Department has one manager (DEPT_NO -gt
MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two
dependencies together imply that (DEPT_NO-gtMGR_PHONE)
This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F
The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)
Module 6 36042023
Example
F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are
SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME
To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y
Module 6 37042023
Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold
whenever the FDs in F hold Armstrongs inference rules
IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ
(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z
IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer
from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]
By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F
Module 6 38042023
Inference Rules for FDs Some additional inference rules that are useful
Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z
The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 21042023
4 Spurious Tuples Bad designs for a relational database may result
in erroneous results for certain JOIN operations The lossless join property is used to
guarantee meaningful results for join operations
GUIDELINE 4 Design relation schemas so that they can be
joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated
Module 6 22042023
Spurious Tuples
There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies
Note that Property (a) is extremely important and cannot be
sacrificed Property (b) is less stringent and may be sacrificed
Module 6 23042023
Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done
during Insertion Modification Deletion
Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values
Generation of invalid and spurious data during joins on improperly related base relations
Module 6 24042023
Functional dependencies Functional dependencies (FDs)
Is a constraint between two sets of attributes from the database
Assumption The entire database is a single universal
relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes
Module 6 25042023
Definition
FDs are used to specify formal measures of the
goodness of relational designs keys that are used to define normal forms for
relations constraints that are derived from the meaning and
interrelationships of the data attributes A set of attributes X functionally determines
a set of attributes Y if the value of X determines a unique value for Y
Module 6 26042023
Functional Dependencies
A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If
t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r
depend on or are determined by the values of the X component
The values of the X component functionally determines the values of Y component
FDs are derived from the real-world constraints on the attributes
The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times
Module 6 27042023
Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760
Superior NA 31700 350
Victoria Africa 26828 250
Aral Sea Asia 24904 280
Huron NA 23000 206
Michigan NA 22300 307
Tanganyika Africa 12700 420
Continent -gtName Name -gtLength
Module 6 28042023
Graphical representation of Functional Dependencies
Module 6 29042023
Examples of FD constraints Social security number uniquely determines
employee name SSN -gt ENAME
Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION
Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS
Module 6 30042023
Examples of FD constraints A FD is a property of the attributes in the
schema R not of a particular legal relation state r of R
It must be defined explicitly by someone who knows the semantics of the attributes of R
The constraint must hold on every relation instance r(R)
If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with
t1[K]=t2[K])
Module 6 31042023
Satisfies algorithm
Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B
How it works Sort the tuples of the relation r on the A attributes so
that tuples with equal values under A are next to each other
Check that tuples with equal values under attributes A also have equal values under B
If it meets the condition 2 then the output of the algorithm is true else it is false
Module 6 32042023
Relation state of TEACH
TEACH
TEACHER COURSE TEXT
Teacher Course Text
Smith Data Structures
Bartram
Smith Data Management
Martin
Hall Compilers Hoffmann
Brown ooad Horowitz
TEACHER -gt COURSE
TEXT -gt COURSE
Module 6 33042023
Drawbacks of Satifies algorithm
Using this algorithm is tedious and time consuming
So inference axioms are used
Module 6 34042023
Inference Rules for Functional Dependencies
F is the set of functional dependencies that are specified on relation schema R
Schema designers specifies the most obvious FDs
The other dependencies can be inferred or deduced from FDs in F
Module 6 35042023
Example of Closure Department has one manager (DEPT_NO -gt
MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two
dependencies together imply that (DEPT_NO-gtMGR_PHONE)
This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F
The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)
Module 6 36042023
Example
F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are
SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME
To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y
Module 6 37042023
Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold
whenever the FDs in F hold Armstrongs inference rules
IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ
(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z
IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer
from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]
By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F
Module 6 38042023
Inference Rules for FDs Some additional inference rules that are useful
Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z
The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 22042023
Spurious Tuples
There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies
Note that Property (a) is extremely important and cannot be
sacrificed Property (b) is less stringent and may be sacrificed
Module 6 23042023
Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done
during Insertion Modification Deletion
Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values
Generation of invalid and spurious data during joins on improperly related base relations
Module 6 24042023
Functional dependencies Functional dependencies (FDs)
Is a constraint between two sets of attributes from the database
Assumption The entire database is a single universal
relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes
Module 6 25042023
Definition
FDs are used to specify formal measures of the
goodness of relational designs keys that are used to define normal forms for
relations constraints that are derived from the meaning and
interrelationships of the data attributes A set of attributes X functionally determines
a set of attributes Y if the value of X determines a unique value for Y
Module 6 26042023
Functional Dependencies
A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If
t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r
depend on or are determined by the values of the X component
The values of the X component functionally determines the values of Y component
FDs are derived from the real-world constraints on the attributes
The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times
Module 6 27042023
Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760
Superior NA 31700 350
Victoria Africa 26828 250
Aral Sea Asia 24904 280
Huron NA 23000 206
Michigan NA 22300 307
Tanganyika Africa 12700 420
Continent -gtName Name -gtLength
Module 6 28042023
Graphical representation of Functional Dependencies
Module 6 29042023
Examples of FD constraints Social security number uniquely determines
employee name SSN -gt ENAME
Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION
Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS
Module 6 30042023
Examples of FD constraints A FD is a property of the attributes in the
schema R not of a particular legal relation state r of R
It must be defined explicitly by someone who knows the semantics of the attributes of R
The constraint must hold on every relation instance r(R)
If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with
t1[K]=t2[K])
Module 6 31042023
Satisfies algorithm
Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B
How it works Sort the tuples of the relation r on the A attributes so
that tuples with equal values under A are next to each other
Check that tuples with equal values under attributes A also have equal values under B
If it meets the condition 2 then the output of the algorithm is true else it is false
Module 6 32042023
Relation state of TEACH
TEACH
TEACHER COURSE TEXT
Teacher Course Text
Smith Data Structures
Bartram
Smith Data Management
Martin
Hall Compilers Hoffmann
Brown ooad Horowitz
TEACHER -gt COURSE
TEXT -gt COURSE
Module 6 33042023
Drawbacks of Satifies algorithm
Using this algorithm is tedious and time consuming
So inference axioms are used
Module 6 34042023
Inference Rules for Functional Dependencies
F is the set of functional dependencies that are specified on relation schema R
Schema designers specifies the most obvious FDs
The other dependencies can be inferred or deduced from FDs in F
Module 6 35042023
Example of Closure Department has one manager (DEPT_NO -gt
MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two
dependencies together imply that (DEPT_NO-gtMGR_PHONE)
This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F
The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)
Module 6 36042023
Example
F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are
SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME
To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y
Module 6 37042023
Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold
whenever the FDs in F hold Armstrongs inference rules
IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ
(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z
IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer
from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]
By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F
Module 6 38042023
Inference Rules for FDs Some additional inference rules that are useful
Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z
The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 23042023
Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done
during Insertion Modification Deletion
Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values
Generation of invalid and spurious data during joins on improperly related base relations
Module 6 24042023
Functional dependencies Functional dependencies (FDs)
Is a constraint between two sets of attributes from the database
Assumption The entire database is a single universal
relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes
Module 6 25042023
Definition
FDs are used to specify formal measures of the
goodness of relational designs keys that are used to define normal forms for
relations constraints that are derived from the meaning and
interrelationships of the data attributes A set of attributes X functionally determines
a set of attributes Y if the value of X determines a unique value for Y
Module 6 26042023
Functional Dependencies
A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If
t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r
depend on or are determined by the values of the X component
The values of the X component functionally determines the values of Y component
FDs are derived from the real-world constraints on the attributes
The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times
Module 6 27042023
Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760
Superior NA 31700 350
Victoria Africa 26828 250
Aral Sea Asia 24904 280
Huron NA 23000 206
Michigan NA 22300 307
Tanganyika Africa 12700 420
Continent -gtName Name -gtLength
Module 6 28042023
Graphical representation of Functional Dependencies
Module 6 29042023
Examples of FD constraints Social security number uniquely determines
employee name SSN -gt ENAME
Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION
Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS
Module 6 30042023
Examples of FD constraints A FD is a property of the attributes in the
schema R not of a particular legal relation state r of R
It must be defined explicitly by someone who knows the semantics of the attributes of R
The constraint must hold on every relation instance r(R)
If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with
t1[K]=t2[K])
Module 6 31042023
Satisfies algorithm
Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B
How it works Sort the tuples of the relation r on the A attributes so
that tuples with equal values under A are next to each other
Check that tuples with equal values under attributes A also have equal values under B
If it meets the condition 2 then the output of the algorithm is true else it is false
Module 6 32042023
Relation state of TEACH
TEACH
TEACHER COURSE TEXT
Teacher Course Text
Smith Data Structures
Bartram
Smith Data Management
Martin
Hall Compilers Hoffmann
Brown ooad Horowitz
TEACHER -gt COURSE
TEXT -gt COURSE
Module 6 33042023
Drawbacks of Satifies algorithm
Using this algorithm is tedious and time consuming
So inference axioms are used
Module 6 34042023
Inference Rules for Functional Dependencies
F is the set of functional dependencies that are specified on relation schema R
Schema designers specifies the most obvious FDs
The other dependencies can be inferred or deduced from FDs in F
Module 6 35042023
Example of Closure Department has one manager (DEPT_NO -gt
MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two
dependencies together imply that (DEPT_NO-gtMGR_PHONE)
This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F
The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)
Module 6 36042023
Example
F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are
SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME
To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y
Module 6 37042023
Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold
whenever the FDs in F hold Armstrongs inference rules
IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ
(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z
IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer
from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]
By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F
Module 6 38042023
Inference Rules for FDs Some additional inference rules that are useful
Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z
The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 24042023
Functional dependencies Functional dependencies (FDs)
Is a constraint between two sets of attributes from the database
Assumption The entire database is a single universal
relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes
Module 6 25042023
Definition
FDs are used to specify formal measures of the
goodness of relational designs keys that are used to define normal forms for
relations constraints that are derived from the meaning and
interrelationships of the data attributes A set of attributes X functionally determines
a set of attributes Y if the value of X determines a unique value for Y
Module 6 26042023
Functional Dependencies
A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If
t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r
depend on or are determined by the values of the X component
The values of the X component functionally determines the values of Y component
FDs are derived from the real-world constraints on the attributes
The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times
Module 6 27042023
Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760
Superior NA 31700 350
Victoria Africa 26828 250
Aral Sea Asia 24904 280
Huron NA 23000 206
Michigan NA 22300 307
Tanganyika Africa 12700 420
Continent -gtName Name -gtLength
Module 6 28042023
Graphical representation of Functional Dependencies
Module 6 29042023
Examples of FD constraints Social security number uniquely determines
employee name SSN -gt ENAME
Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION
Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS
Module 6 30042023
Examples of FD constraints A FD is a property of the attributes in the
schema R not of a particular legal relation state r of R
It must be defined explicitly by someone who knows the semantics of the attributes of R
The constraint must hold on every relation instance r(R)
If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with
t1[K]=t2[K])
Module 6 31042023
Satisfies algorithm
Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B
How it works Sort the tuples of the relation r on the A attributes so
that tuples with equal values under A are next to each other
Check that tuples with equal values under attributes A also have equal values under B
If it meets the condition 2 then the output of the algorithm is true else it is false
Module 6 32042023
Relation state of TEACH
TEACH
TEACHER COURSE TEXT
Teacher Course Text
Smith Data Structures
Bartram
Smith Data Management
Martin
Hall Compilers Hoffmann
Brown ooad Horowitz
TEACHER -gt COURSE
TEXT -gt COURSE
Module 6 33042023
Drawbacks of Satifies algorithm
Using this algorithm is tedious and time consuming
So inference axioms are used
Module 6 34042023
Inference Rules for Functional Dependencies
F is the set of functional dependencies that are specified on relation schema R
Schema designers specifies the most obvious FDs
The other dependencies can be inferred or deduced from FDs in F
Module 6 35042023
Example of Closure Department has one manager (DEPT_NO -gt
MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two
dependencies together imply that (DEPT_NO-gtMGR_PHONE)
This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F
The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)
Module 6 36042023
Example
F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are
SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME
To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y
Module 6 37042023
Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold
whenever the FDs in F hold Armstrongs inference rules
IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ
(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z
IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer
from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]
By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F
Module 6 38042023
Inference Rules for FDs Some additional inference rules that are useful
Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z
The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 25042023
Definition
FDs are used to specify formal measures of the
goodness of relational designs keys that are used to define normal forms for
relations constraints that are derived from the meaning and
interrelationships of the data attributes A set of attributes X functionally determines
a set of attributes Y if the value of X determines a unique value for Y
Module 6 26042023
Functional Dependencies
A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If
t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r
depend on or are determined by the values of the X component
The values of the X component functionally determines the values of Y component
FDs are derived from the real-world constraints on the attributes
The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times
Module 6 27042023
Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760
Superior NA 31700 350
Victoria Africa 26828 250
Aral Sea Asia 24904 280
Huron NA 23000 206
Michigan NA 22300 307
Tanganyika Africa 12700 420
Continent -gtName Name -gtLength
Module 6 28042023
Graphical representation of Functional Dependencies
Module 6 29042023
Examples of FD constraints Social security number uniquely determines
employee name SSN -gt ENAME
Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION
Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS
Module 6 30042023
Examples of FD constraints A FD is a property of the attributes in the
schema R not of a particular legal relation state r of R
It must be defined explicitly by someone who knows the semantics of the attributes of R
The constraint must hold on every relation instance r(R)
If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with
t1[K]=t2[K])
Module 6 31042023
Satisfies algorithm
Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B
How it works Sort the tuples of the relation r on the A attributes so
that tuples with equal values under A are next to each other
Check that tuples with equal values under attributes A also have equal values under B
If it meets the condition 2 then the output of the algorithm is true else it is false
Module 6 32042023
Relation state of TEACH
TEACH
TEACHER COURSE TEXT
Teacher Course Text
Smith Data Structures
Bartram
Smith Data Management
Martin
Hall Compilers Hoffmann
Brown ooad Horowitz
TEACHER -gt COURSE
TEXT -gt COURSE
Module 6 33042023
Drawbacks of Satifies algorithm
Using this algorithm is tedious and time consuming
So inference axioms are used
Module 6 34042023
Inference Rules for Functional Dependencies
F is the set of functional dependencies that are specified on relation schema R
Schema designers specifies the most obvious FDs
The other dependencies can be inferred or deduced from FDs in F
Module 6 35042023
Example of Closure Department has one manager (DEPT_NO -gt
MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two
dependencies together imply that (DEPT_NO-gtMGR_PHONE)
This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F
The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)
Module 6 36042023
Example
F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are
SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME
To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y
Module 6 37042023
Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold
whenever the FDs in F hold Armstrongs inference rules
IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ
(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z
IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer
from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]
By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F
Module 6 38042023
Inference Rules for FDs Some additional inference rules that are useful
Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z
The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 26042023
Functional Dependencies
A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If
t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r
depend on or are determined by the values of the X component
The values of the X component functionally determines the values of Y component
FDs are derived from the real-world constraints on the attributes
The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times
Module 6 27042023
Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760
Superior NA 31700 350
Victoria Africa 26828 250
Aral Sea Asia 24904 280
Huron NA 23000 206
Michigan NA 22300 307
Tanganyika Africa 12700 420
Continent -gtName Name -gtLength
Module 6 28042023
Graphical representation of Functional Dependencies
Module 6 29042023
Examples of FD constraints Social security number uniquely determines
employee name SSN -gt ENAME
Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION
Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS
Module 6 30042023
Examples of FD constraints A FD is a property of the attributes in the
schema R not of a particular legal relation state r of R
It must be defined explicitly by someone who knows the semantics of the attributes of R
The constraint must hold on every relation instance r(R)
If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with
t1[K]=t2[K])
Module 6 31042023
Satisfies algorithm
Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B
How it works Sort the tuples of the relation r on the A attributes so
that tuples with equal values under A are next to each other
Check that tuples with equal values under attributes A also have equal values under B
If it meets the condition 2 then the output of the algorithm is true else it is false
Module 6 32042023
Relation state of TEACH
TEACH
TEACHER COURSE TEXT
Teacher Course Text
Smith Data Structures
Bartram
Smith Data Management
Martin
Hall Compilers Hoffmann
Brown ooad Horowitz
TEACHER -gt COURSE
TEXT -gt COURSE
Module 6 33042023
Drawbacks of Satifies algorithm
Using this algorithm is tedious and time consuming
So inference axioms are used
Module 6 34042023
Inference Rules for Functional Dependencies
F is the set of functional dependencies that are specified on relation schema R
Schema designers specifies the most obvious FDs
The other dependencies can be inferred or deduced from FDs in F
Module 6 35042023
Example of Closure Department has one manager (DEPT_NO -gt
MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two
dependencies together imply that (DEPT_NO-gtMGR_PHONE)
This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F
The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)
Module 6 36042023
Example
F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are
SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME
To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y
Module 6 37042023
Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold
whenever the FDs in F hold Armstrongs inference rules
IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ
(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z
IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer
from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]
By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F
Module 6 38042023
Inference Rules for FDs Some additional inference rules that are useful
Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z
The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 27042023
Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760
Superior NA 31700 350
Victoria Africa 26828 250
Aral Sea Asia 24904 280
Huron NA 23000 206
Michigan NA 22300 307
Tanganyika Africa 12700 420
Continent -gtName Name -gtLength
Module 6 28042023
Graphical representation of Functional Dependencies
Module 6 29042023
Examples of FD constraints Social security number uniquely determines
employee name SSN -gt ENAME
Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION
Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS
Module 6 30042023
Examples of FD constraints A FD is a property of the attributes in the
schema R not of a particular legal relation state r of R
It must be defined explicitly by someone who knows the semantics of the attributes of R
The constraint must hold on every relation instance r(R)
If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with
t1[K]=t2[K])
Module 6 31042023
Satisfies algorithm
Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B
How it works Sort the tuples of the relation r on the A attributes so
that tuples with equal values under A are next to each other
Check that tuples with equal values under attributes A also have equal values under B
If it meets the condition 2 then the output of the algorithm is true else it is false
Module 6 32042023
Relation state of TEACH
TEACH
TEACHER COURSE TEXT
Teacher Course Text
Smith Data Structures
Bartram
Smith Data Management
Martin
Hall Compilers Hoffmann
Brown ooad Horowitz
TEACHER -gt COURSE
TEXT -gt COURSE
Module 6 33042023
Drawbacks of Satifies algorithm
Using this algorithm is tedious and time consuming
So inference axioms are used
Module 6 34042023
Inference Rules for Functional Dependencies
F is the set of functional dependencies that are specified on relation schema R
Schema designers specifies the most obvious FDs
The other dependencies can be inferred or deduced from FDs in F
Module 6 35042023
Example of Closure Department has one manager (DEPT_NO -gt
MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two
dependencies together imply that (DEPT_NO-gtMGR_PHONE)
This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F
The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)
Module 6 36042023
Example
F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are
SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME
To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y
Module 6 37042023
Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold
whenever the FDs in F hold Armstrongs inference rules
IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ
(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z
IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer
from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]
By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F
Module 6 38042023
Inference Rules for FDs Some additional inference rules that are useful
Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z
The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 28042023
Graphical representation of Functional Dependencies
Module 6 29042023
Examples of FD constraints Social security number uniquely determines
employee name SSN -gt ENAME
Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION
Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS
Module 6 30042023
Examples of FD constraints A FD is a property of the attributes in the
schema R not of a particular legal relation state r of R
It must be defined explicitly by someone who knows the semantics of the attributes of R
The constraint must hold on every relation instance r(R)
If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with
t1[K]=t2[K])
Module 6 31042023
Satisfies algorithm
Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B
How it works Sort the tuples of the relation r on the A attributes so
that tuples with equal values under A are next to each other
Check that tuples with equal values under attributes A also have equal values under B
If it meets the condition 2 then the output of the algorithm is true else it is false
Module 6 32042023
Relation state of TEACH
TEACH
TEACHER COURSE TEXT
Teacher Course Text
Smith Data Structures
Bartram
Smith Data Management
Martin
Hall Compilers Hoffmann
Brown ooad Horowitz
TEACHER -gt COURSE
TEXT -gt COURSE
Module 6 33042023
Drawbacks of Satifies algorithm
Using this algorithm is tedious and time consuming
So inference axioms are used
Module 6 34042023
Inference Rules for Functional Dependencies
F is the set of functional dependencies that are specified on relation schema R
Schema designers specifies the most obvious FDs
The other dependencies can be inferred or deduced from FDs in F
Module 6 35042023
Example of Closure Department has one manager (DEPT_NO -gt
MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two
dependencies together imply that (DEPT_NO-gtMGR_PHONE)
This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F
The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)
Module 6 36042023
Example
F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are
SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME
To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y
Module 6 37042023
Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold
whenever the FDs in F hold Armstrongs inference rules
IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ
(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z
IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer
from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]
By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F
Module 6 38042023
Inference Rules for FDs Some additional inference rules that are useful
Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z
The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 29042023
Examples of FD constraints Social security number uniquely determines
employee name SSN -gt ENAME
Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION
Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS
Module 6 30042023
Examples of FD constraints A FD is a property of the attributes in the
schema R not of a particular legal relation state r of R
It must be defined explicitly by someone who knows the semantics of the attributes of R
The constraint must hold on every relation instance r(R)
If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with
t1[K]=t2[K])
Module 6 31042023
Satisfies algorithm
Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B
How it works Sort the tuples of the relation r on the A attributes so
that tuples with equal values under A are next to each other
Check that tuples with equal values under attributes A also have equal values under B
If it meets the condition 2 then the output of the algorithm is true else it is false
Module 6 32042023
Relation state of TEACH
TEACH
TEACHER COURSE TEXT
Teacher Course Text
Smith Data Structures
Bartram
Smith Data Management
Martin
Hall Compilers Hoffmann
Brown ooad Horowitz
TEACHER -gt COURSE
TEXT -gt COURSE
Module 6 33042023
Drawbacks of Satifies algorithm
Using this algorithm is tedious and time consuming
So inference axioms are used
Module 6 34042023
Inference Rules for Functional Dependencies
F is the set of functional dependencies that are specified on relation schema R
Schema designers specifies the most obvious FDs
The other dependencies can be inferred or deduced from FDs in F
Module 6 35042023
Example of Closure Department has one manager (DEPT_NO -gt
MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two
dependencies together imply that (DEPT_NO-gtMGR_PHONE)
This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F
The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)
Module 6 36042023
Example
F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are
SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME
To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y
Module 6 37042023
Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold
whenever the FDs in F hold Armstrongs inference rules
IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ
(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z
IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer
from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]
By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F
Module 6 38042023
Inference Rules for FDs Some additional inference rules that are useful
Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z
The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 30042023
Examples of FD constraints A FD is a property of the attributes in the
schema R not of a particular legal relation state r of R
It must be defined explicitly by someone who knows the semantics of the attributes of R
The constraint must hold on every relation instance r(R)
If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with
t1[K]=t2[K])
Module 6 31042023
Satisfies algorithm
Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B
How it works Sort the tuples of the relation r on the A attributes so
that tuples with equal values under A are next to each other
Check that tuples with equal values under attributes A also have equal values under B
If it meets the condition 2 then the output of the algorithm is true else it is false
Module 6 32042023
Relation state of TEACH
TEACH
TEACHER COURSE TEXT
Teacher Course Text
Smith Data Structures
Bartram
Smith Data Management
Martin
Hall Compilers Hoffmann
Brown ooad Horowitz
TEACHER -gt COURSE
TEXT -gt COURSE
Module 6 33042023
Drawbacks of Satifies algorithm
Using this algorithm is tedious and time consuming
So inference axioms are used
Module 6 34042023
Inference Rules for Functional Dependencies
F is the set of functional dependencies that are specified on relation schema R
Schema designers specifies the most obvious FDs
The other dependencies can be inferred or deduced from FDs in F
Module 6 35042023
Example of Closure Department has one manager (DEPT_NO -gt
MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two
dependencies together imply that (DEPT_NO-gtMGR_PHONE)
This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F
The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)
Module 6 36042023
Example
F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are
SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME
To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y
Module 6 37042023
Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold
whenever the FDs in F hold Armstrongs inference rules
IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ
(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z
IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer
from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]
By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F
Module 6 38042023
Inference Rules for FDs Some additional inference rules that are useful
Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z
The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 31042023
Satisfies algorithm
Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B
How it works Sort the tuples of the relation r on the A attributes so
that tuples with equal values under A are next to each other
Check that tuples with equal values under attributes A also have equal values under B
If it meets the condition 2 then the output of the algorithm is true else it is false
Module 6 32042023
Relation state of TEACH
TEACH
TEACHER COURSE TEXT
Teacher Course Text
Smith Data Structures
Bartram
Smith Data Management
Martin
Hall Compilers Hoffmann
Brown ooad Horowitz
TEACHER -gt COURSE
TEXT -gt COURSE
Module 6 33042023
Drawbacks of Satifies algorithm
Using this algorithm is tedious and time consuming
So inference axioms are used
Module 6 34042023
Inference Rules for Functional Dependencies
F is the set of functional dependencies that are specified on relation schema R
Schema designers specifies the most obvious FDs
The other dependencies can be inferred or deduced from FDs in F
Module 6 35042023
Example of Closure Department has one manager (DEPT_NO -gt
MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two
dependencies together imply that (DEPT_NO-gtMGR_PHONE)
This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F
The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)
Module 6 36042023
Example
F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are
SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME
To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y
Module 6 37042023
Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold
whenever the FDs in F hold Armstrongs inference rules
IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ
(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z
IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer
from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]
By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F
Module 6 38042023
Inference Rules for FDs Some additional inference rules that are useful
Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z
The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 32042023
Relation state of TEACH
TEACH
TEACHER COURSE TEXT
Teacher Course Text
Smith Data Structures
Bartram
Smith Data Management
Martin
Hall Compilers Hoffmann
Brown ooad Horowitz
TEACHER -gt COURSE
TEXT -gt COURSE
Module 6 33042023
Drawbacks of Satifies algorithm
Using this algorithm is tedious and time consuming
So inference axioms are used
Module 6 34042023
Inference Rules for Functional Dependencies
F is the set of functional dependencies that are specified on relation schema R
Schema designers specifies the most obvious FDs
The other dependencies can be inferred or deduced from FDs in F
Module 6 35042023
Example of Closure Department has one manager (DEPT_NO -gt
MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two
dependencies together imply that (DEPT_NO-gtMGR_PHONE)
This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F
The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)
Module 6 36042023
Example
F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are
SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME
To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y
Module 6 37042023
Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold
whenever the FDs in F hold Armstrongs inference rules
IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ
(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z
IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer
from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]
By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F
Module 6 38042023
Inference Rules for FDs Some additional inference rules that are useful
Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z
The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 33042023
Drawbacks of Satifies algorithm
Using this algorithm is tedious and time consuming
So inference axioms are used
Module 6 34042023
Inference Rules for Functional Dependencies
F is the set of functional dependencies that are specified on relation schema R
Schema designers specifies the most obvious FDs
The other dependencies can be inferred or deduced from FDs in F
Module 6 35042023
Example of Closure Department has one manager (DEPT_NO -gt
MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two
dependencies together imply that (DEPT_NO-gtMGR_PHONE)
This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F
The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)
Module 6 36042023
Example
F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are
SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME
To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y
Module 6 37042023
Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold
whenever the FDs in F hold Armstrongs inference rules
IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ
(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z
IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer
from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]
By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F
Module 6 38042023
Inference Rules for FDs Some additional inference rules that are useful
Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z
The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 34042023
Inference Rules for Functional Dependencies
F is the set of functional dependencies that are specified on relation schema R
Schema designers specifies the most obvious FDs
The other dependencies can be inferred or deduced from FDs in F
Module 6 35042023
Example of Closure Department has one manager (DEPT_NO -gt
MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two
dependencies together imply that (DEPT_NO-gtMGR_PHONE)
This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F
The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)
Module 6 36042023
Example
F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are
SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME
To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y
Module 6 37042023
Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold
whenever the FDs in F hold Armstrongs inference rules
IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ
(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z
IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer
from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]
By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F
Module 6 38042023
Inference Rules for FDs Some additional inference rules that are useful
Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z
The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 35042023
Example of Closure Department has one manager (DEPT_NO -gt
MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two
dependencies together imply that (DEPT_NO-gtMGR_PHONE)
This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F
The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)
Module 6 36042023
Example
F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are
SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME
To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y
Module 6 37042023
Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold
whenever the FDs in F hold Armstrongs inference rules
IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ
(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z
IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer
from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]
By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F
Module 6 38042023
Inference Rules for FDs Some additional inference rules that are useful
Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z
The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 36042023
Example
F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are
SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME
To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y
Module 6 37042023
Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold
whenever the FDs in F hold Armstrongs inference rules
IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ
(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z
IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer
from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]
By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F
Module 6 38042023
Inference Rules for FDs Some additional inference rules that are useful
Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z
The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 37042023
Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold
whenever the FDs in F hold Armstrongs inference rules
IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ
(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z
IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer
from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]
By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F
Module 6 38042023
Inference Rules for FDs Some additional inference rules that are useful
Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z
The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 38042023
Inference Rules for FDs Some additional inference rules that are useful
Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z
The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 39042023
Examples
1 Given the set F=ABCX BXZ derive ACZ using the inference axioms
2 Given F=AB CD with C subset of B show that F|=AD
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 40042023
Redundant functional dependencies Given a set F of FDs a FD AB of F is said to
be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB
Redundant FDs are extra and unnecessary and can be safely removed from the set F
Eliminating redundant FDs allows us to minimize the set of FDs
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 41042023
Equivalence of Sets of Functional Dependencies
A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F
Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold
For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 42042023
Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary
functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it
that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 43042023
Minimal cover
(a) every RHS of each dependency is a single attribute
(b) for no X -gt A in F is the set F - X -gt A equivalent to F
(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F
no redundanc
ies
no dependencies may be replaced by a dependency
that involves a subset of the left hand side
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 44042023
Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F
F be a set of FDs over schema R and let A1A2B1B2
A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 46042023
CANONICAL COVER (FC)
1 Every FD of FC is simple RHS has one attribute
2 FC is left-reduced
3 FC is nonredudant
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 47042023
Problem
Given a set F of FDs find a cononical cover for F
FC = XZ XYWP XYZWQ XZR
1 FC= XZ XYW XYP XYZ XYW XYQ XZR
2 FC = XZ XYW XYP XYQ XZR
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 48042023
Normal Forms Based on Primary Keys 1 Normalization of Relations
2 Practical Use of Normal Forms
3 Definitions of Keys and Attributes participating in Keys
4 First Normal Form
5 Second Normal Form
6 Third Normal Form
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 49042023
Normalization of Relations
2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)
4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)
Additional properties may be needed to ensure a good relational design lossless join and dependency preservation
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 50042023
Normalization of Relations
Proposed by Codd Normalizationanalysing the given relation based on their FDs and
primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies
Provides the database designer with Formal framework for analyzing relation schemas based on keys
and FD Series of normal form tests
Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 51042023
Normalization of Relations
Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition
Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 52042023
Practical Use of Normal Forms Normalization is carried out in practice so that
the resulting designs are of high quality and meet the desirable properties
The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect
The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)
Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 53042023
Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = A1 A2
An is a set of attributes S subset-of R with the
property that no two tuples t1 and t2 in any legal
relation state r of R will have t1[S] = t2[S]
A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 54042023
Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys
A Prime attribute must be a member of some candidate key
A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 55042023
First Normal Form
Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic
Hence 1NF disallows relations within relations or relations as attribute values within tuples
Considered to be part of the definition of relation
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 56042023
Normalization into 1NF
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 57042023
Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in
a separate relation along with the primary key2 Expand the key so that there will be a separate tuple
in the original relation It has disadvantage of introducing redundancy
3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values
1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 58042023
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 59042023
Normalization nested relations into 1NF
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Additional problems from schaum series Pg 178 51
Module 6 60042023
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 61042023
Second Normal Form Uses the concepts of FDs primary key
Definitions Prime attribute - attribute that is member of the
primary key K Full functional dependency - a FD Y -gt Z
where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold
- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 62042023
Second Normal Form
A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key
R can be decomposed into 2NF relations via the process of 2NF normalization
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 63042023
Normalizing into 2NF
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Conversion to 2NF
A A A
B B D
C C
D
Module 6 64042023
Convert to
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID
prog_Pac_name Tot-Hours-wor)
Prog_Pack_IDProg_Pac_name
1 What is the highest normal form
2 Transform into next highest form
Module 6 65042023
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 66042023
Third Normal Form
Definition Transitive functional dependency - a FD X -gt
Z that can be derived from two FDs X -gt Y and Y -gt Z Examples
- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 67042023
Third Normal Form A relation schema R is in third normal form (3NF) if it is
in 2NF and no non-prime attribute A in R is transitively dependent on the primary key
R can be decomposed into 3NF relations via the process of 3NF normalization
NOTE
In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 68042023
Normalization into 3NF
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 69042023
Normalizing into 2NF and 3NF
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 70042023
SUMMARY
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 71042023
Normalize the following relation
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 73042023
Normalization into 2NF
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 75042023
Normalization into 3NF
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Additional problems
Pg 186513
Module 6 76042023
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 78042023
Boyce-Codd normal form
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 79042023
BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal
Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R
Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF
There exist relations that are in 3NF but not in BCNF
The goal is to have each relation in BCNF (or 3NF)
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 80042023
How is BCNF different from 3NF
For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key
To test whether a relation is in BCNF lsquoXrsquo must be a candidate key
So relation in BCNF will definitely be in 3NF but not the other way around
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 81042023
A relation TEACH that is in 3NF but not in BCNF
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 82042023
Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH
fd1 student course -gt instructor fd2 instructor -gt course
student course is a candidate key for this relation So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 83042023
Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH
student instructor and student course course instructor and course student instructor course and instructor student
All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency
preservation But we cannot sacrifice the non-additivity property after decomposition
Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Lossless or lossy decompositions When we decompose a relation we need to
make sure that we can recover the original relation from the new relations that have replaced it
If we can recover the original relation then the decomposition is lossless else it is lossy
Example 511 pg 162
Module 6 86042023
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Testing for lossless joins
Lossless join algorithm Example 512
Module 6 87042023
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 89042023
Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for
example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other
A multi-valued dependency can be further defined as being trivial or nontrivial
A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R
A MVD is defined as being nontrivial if neither of the above two conditions is satisfied
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 90042023
Fourth Normal Form (4NF) Fourth normal form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies
It is used for removing multivalued dependency
In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 91042023
Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and
ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations
EMP_PROJECTS and EMP_DEPENDENTS
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 92042023
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 93042023
Fifth Normal Form (5NF) Join dependency
Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z
Lossless-join dependency A property of decomposition which ensures that
no spurious tuples are generated when relations are reunited through a natural join operation
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 94042023
Fifth Normal Form (5NF)
Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join
dependency
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 95042023
Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has
the JD(R1 R2 R3)
(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice
Module 6 96042023
Fifth Normal Form (5NF)
Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes
Hence 5NF is rarely used in practice