Top Banner
Module 6 Relational Database Design
90

Module 6

Jan 03, 2016

Download

Documents

benedict-kane

Module 6. Relational Database Design. Topics to be covered. Pitfalls in relational database design Functional dependencies Armstrong Axioms Decomposition Desirable properties of decomposition Boyce-code normal form 3 rd and 4 th normal form Mention of other normal forms. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Module 6

Module 6

Relational Database Design

Module 6 2042023

Topics to be covered

Pitfalls in relational database design Functional dependencies Armstrong Axioms Decomposition Desirable properties of decomposition Boyce-code normal form 3rd and 4th normal form Mention of other normal forms

Module 6 3042023

Evaluating relation schemas

Two levels of relation schemasThe logical or conceptual view

How users interpret the relation schemas and the meaning of their attributes

Implementation or storage view How the tuples in the base relation are stored

and updated

Module 6 4042023

Informal Design Guidelines for Relational Databases Four informal measures of quality for

relation schema design are

1 Imparting clear semantics to attributes in Relations

2 Reducing the redundant values in tuples

3 Reducing the null values in tuples

4 Disallowing the possibility of generating spurious tuples

Module 6 5042023

1Semantics of the Relation Attributes GUIDELINE 1 Informally each tuple in a relation

should represent one entity or relationship instance (Applies to individual relations and their attributes) Attributes of different entities (EMPLOYEEs

DEPARTMENTs PROJECTs) should not be mixed in the same relation

Only foreign keys should be used to refer to other entities

Entity and relationship attributes should be kept apart as much as possible

Bottom Line Design a schema that can be explained easily relation by relation The semantics of attributes should be easy to interpret

Module 6 6042023

A Simplified COMPANY relational database schema

Module 6 7042023

Two relation schemas suffering from update anomalies

ENAME SSN BDATEADDRES

SDNUMBE

RDNAME

DMGRSSN

PLOCATION

SSNPNUMBE

RHOURS ENAME PNAME

EMP_PROJ

EMP_DEPT

Module 6 8042023

Two relation schemas suffering from update anomalies Although there is nothing wrong logically with

these 2 relations they are considered poor designs because they violate guideline 1 by mixing attributes from distinct real world entities

EMP_DEPT mixes attributes of employee and department and EMP_PROJ mixes attributes of employees amp projects and the WORKS_ON relationship

They may be used as views but they cause problems when used as base relations

Module 6 9042023

2Redundant Information in Tuples and Update Anomalies Goal of schema design is to minimize the

storage space used by the base relations Information is stored redundantly Wastes storage

Causes problems with update anomalies Insertion anomalies Deletion anomalies Modification anomalies

Module 6 10042023

Two relation schemas suffering from update anomalies

ENAME SSN BDATEADDRES

SDNUMBE

RDNAME

DMGRSSN

PLOCATION

SSNPNUMBE

RHOURS ENAME PNAME

EMP_DEPT

EMP_PROJ

Module 6 11042023

EXAMPLE OF AN INSERT ANOMALY Consider the relation

EMP_PROJ(Emp Proj Ename Pname No_hours)

Insert Anomaly Cannot insert a project unless an employee is

assigned to it Conversely

Cannot insert an employee unless an heshe is assigned to a project

Module 6 12042023

EXAMPLE OF AN DELETE ANOMALY Consider the relation

EMP_PROJ(Emp Proj Ename Pname No_hours)

Delete Anomaly When a project is deleted it will result in deleting

all the employees who work on that project Alternately if an employee is the sole employee

on a project deleting that employee would result in deleting the corresponding project

Module 6 13042023

EXAMPLE OF AN UPDATE ANOMALY Consider the relation

EMP_PROJ(Emp Proj Ename Pname No_hours)

Update AnomalyChanging the name of project number P1

from ldquoBillingrdquo to ldquoCustomer-Accountingrdquo may cause this update to be made for all 100 employees working on project P1

Module 6 14042023

Module 6 15042023

Guideline to Redundant Information in Tuples and Update Anomalies GUIDELINE 2

Design a schema that does not suffer from the insertion deletion and update anomalies

If there are any anomalies present then note them so that applications can be made to take them into account

In general it is advisable to use anomaly free base relations and to specify views that include the joins for placing together the attributes frequently referenced in important queries

Module 6 16042023

Problems with Nulls If many attributes are grouped together

as a fat relation it gives rise to many nulls in the tuples

Waste storage Problems in understanding the

meaning of the attributes Difficult while using Nulls in aggregate

operators like count or sum

Module 6 17042023

3 Null Values in Tuples Interpretations of nulls

Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist but unavailable

GUIDELINE 3 Relations should be designed such that their

tuples will have as few NULL values as possible Attributes that are NULL frequently could be

placed in separate relations (with the primary key) Example-

if only 10 of employees have individual offices it is better not to include office_number as an attribute in the employee relation

Better create a new relation emp_offices(essn office_number)

Module 6 18042023

Example of Spurious Tuples

Module 6 19042023

Generation of spurious tuples The two relations EMP_PROJ1 and EMP_LOCS as

the base relations of EMP_PROJ is not a good schema design

Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ

These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid

This is because the PLOCATION attribute which is used for joining is neither a primary key nor a foreign key in either EMP_LOCS AND EMP_PROJ1

Module 6 20042023

Example of Spurious Tuples contd

Module 6 21042023

4 Spurious Tuples Bad designs for a relational database may result

in erroneous results for certain JOIN operations The lossless join property is used to

guarantee meaningful results for join operations

GUIDELINE 4 Design relation schemas so that they can be

joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated

Module 6 22042023

Spurious Tuples

There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies

Note that Property (a) is extremely important and cannot be

sacrificed Property (b) is less stringent and may be sacrificed

Module 6 23042023

Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done

during Insertion Modification Deletion

Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values

Generation of invalid and spurious data during joins on improperly related base relations

Module 6 24042023

Functional dependencies Functional dependencies (FDs)

Is a constraint between two sets of attributes from the database

Assumption The entire database is a single universal

relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes

Module 6 25042023

Definition

FDs are used to specify formal measures of the

goodness of relational designs keys that are used to define normal forms for

relations constraints that are derived from the meaning and

interrelationships of the data attributes A set of attributes X functionally determines

a set of attributes Y if the value of X determines a unique value for Y

Module 6 26042023

Functional Dependencies

A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If

t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r

depend on or are determined by the values of the X component

The values of the X component functionally determines the values of Y component

FDs are derived from the real-world constraints on the attributes

The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times

Module 6 27042023

Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760

Superior NA 31700 350

Victoria Africa 26828 250

Aral Sea Asia 24904 280

Huron NA 23000 206

Michigan NA 22300 307

Tanganyika Africa 12700 420

Continent -gtName Name -gtLength

Module 6 28042023

Graphical representation of Functional Dependencies

Module 6 29042023

Examples of FD constraints Social security number uniquely determines

employee name SSN -gt ENAME

Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION

Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS

Module 6 30042023

Examples of FD constraints A FD is a property of the attributes in the

schema R not of a particular legal relation state r of R

It must be defined explicitly by someone who knows the semantics of the attributes of R

The constraint must hold on every relation instance r(R)

If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with

t1[K]=t2[K])

Module 6 31042023

Satisfies algorithm

Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B

How it works Sort the tuples of the relation r on the A attributes so

that tuples with equal values under A are next to each other

Check that tuples with equal values under attributes A also have equal values under B

If it meets the condition 2 then the output of the algorithm is true else it is false

Module 6 32042023

Relation state of TEACH

TEACH

TEACHER COURSE TEXT

Teacher Course Text

Smith Data Structures

Bartram

Smith Data Management

Martin

Hall Compilers Hoffmann

Brown ooad Horowitz

TEACHER -gt COURSE

TEXT -gt COURSE

Module 6 33042023

Drawbacks of Satifies algorithm

Using this algorithm is tedious and time consuming

So inference axioms are used

Module 6 34042023

Inference Rules for Functional Dependencies

F is the set of functional dependencies that are specified on relation schema R

Schema designers specifies the most obvious FDs

The other dependencies can be inferred or deduced from FDs in F

Module 6 35042023

Example of Closure Department has one manager (DEPT_NO -gt

MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two

dependencies together imply that (DEPT_NO-gtMGR_PHONE)

This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F

The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)

Module 6 36042023

Example

F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are

SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME

To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y

Module 6 37042023

Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold

whenever the FDs in F hold Armstrongs inference rules

IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ

(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z

IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer

from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]

By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F

Module 6 38042023

Inference Rules for FDs Some additional inference rules that are useful

Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z

The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 2: Module 6

Module 6 2042023

Topics to be covered

Pitfalls in relational database design Functional dependencies Armstrong Axioms Decomposition Desirable properties of decomposition Boyce-code normal form 3rd and 4th normal form Mention of other normal forms

Module 6 3042023

Evaluating relation schemas

Two levels of relation schemasThe logical or conceptual view

How users interpret the relation schemas and the meaning of their attributes

Implementation or storage view How the tuples in the base relation are stored

and updated

Module 6 4042023

Informal Design Guidelines for Relational Databases Four informal measures of quality for

relation schema design are

1 Imparting clear semantics to attributes in Relations

2 Reducing the redundant values in tuples

3 Reducing the null values in tuples

4 Disallowing the possibility of generating spurious tuples

Module 6 5042023

1Semantics of the Relation Attributes GUIDELINE 1 Informally each tuple in a relation

should represent one entity or relationship instance (Applies to individual relations and their attributes) Attributes of different entities (EMPLOYEEs

DEPARTMENTs PROJECTs) should not be mixed in the same relation

Only foreign keys should be used to refer to other entities

Entity and relationship attributes should be kept apart as much as possible

Bottom Line Design a schema that can be explained easily relation by relation The semantics of attributes should be easy to interpret

Module 6 6042023

A Simplified COMPANY relational database schema

Module 6 7042023

Two relation schemas suffering from update anomalies

ENAME SSN BDATEADDRES

SDNUMBE

RDNAME

DMGRSSN

PLOCATION

SSNPNUMBE

RHOURS ENAME PNAME

EMP_PROJ

EMP_DEPT

Module 6 8042023

Two relation schemas suffering from update anomalies Although there is nothing wrong logically with

these 2 relations they are considered poor designs because they violate guideline 1 by mixing attributes from distinct real world entities

EMP_DEPT mixes attributes of employee and department and EMP_PROJ mixes attributes of employees amp projects and the WORKS_ON relationship

They may be used as views but they cause problems when used as base relations

Module 6 9042023

2Redundant Information in Tuples and Update Anomalies Goal of schema design is to minimize the

storage space used by the base relations Information is stored redundantly Wastes storage

Causes problems with update anomalies Insertion anomalies Deletion anomalies Modification anomalies

Module 6 10042023

Two relation schemas suffering from update anomalies

ENAME SSN BDATEADDRES

SDNUMBE

RDNAME

DMGRSSN

PLOCATION

SSNPNUMBE

RHOURS ENAME PNAME

EMP_DEPT

EMP_PROJ

Module 6 11042023

EXAMPLE OF AN INSERT ANOMALY Consider the relation

EMP_PROJ(Emp Proj Ename Pname No_hours)

Insert Anomaly Cannot insert a project unless an employee is

assigned to it Conversely

Cannot insert an employee unless an heshe is assigned to a project

Module 6 12042023

EXAMPLE OF AN DELETE ANOMALY Consider the relation

EMP_PROJ(Emp Proj Ename Pname No_hours)

Delete Anomaly When a project is deleted it will result in deleting

all the employees who work on that project Alternately if an employee is the sole employee

on a project deleting that employee would result in deleting the corresponding project

Module 6 13042023

EXAMPLE OF AN UPDATE ANOMALY Consider the relation

EMP_PROJ(Emp Proj Ename Pname No_hours)

Update AnomalyChanging the name of project number P1

from ldquoBillingrdquo to ldquoCustomer-Accountingrdquo may cause this update to be made for all 100 employees working on project P1

Module 6 14042023

Module 6 15042023

Guideline to Redundant Information in Tuples and Update Anomalies GUIDELINE 2

Design a schema that does not suffer from the insertion deletion and update anomalies

If there are any anomalies present then note them so that applications can be made to take them into account

In general it is advisable to use anomaly free base relations and to specify views that include the joins for placing together the attributes frequently referenced in important queries

Module 6 16042023

Problems with Nulls If many attributes are grouped together

as a fat relation it gives rise to many nulls in the tuples

Waste storage Problems in understanding the

meaning of the attributes Difficult while using Nulls in aggregate

operators like count or sum

Module 6 17042023

3 Null Values in Tuples Interpretations of nulls

Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist but unavailable

GUIDELINE 3 Relations should be designed such that their

tuples will have as few NULL values as possible Attributes that are NULL frequently could be

placed in separate relations (with the primary key) Example-

if only 10 of employees have individual offices it is better not to include office_number as an attribute in the employee relation

Better create a new relation emp_offices(essn office_number)

Module 6 18042023

Example of Spurious Tuples

Module 6 19042023

Generation of spurious tuples The two relations EMP_PROJ1 and EMP_LOCS as

the base relations of EMP_PROJ is not a good schema design

Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ

These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid

This is because the PLOCATION attribute which is used for joining is neither a primary key nor a foreign key in either EMP_LOCS AND EMP_PROJ1

Module 6 20042023

Example of Spurious Tuples contd

Module 6 21042023

4 Spurious Tuples Bad designs for a relational database may result

in erroneous results for certain JOIN operations The lossless join property is used to

guarantee meaningful results for join operations

GUIDELINE 4 Design relation schemas so that they can be

joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated

Module 6 22042023

Spurious Tuples

There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies

Note that Property (a) is extremely important and cannot be

sacrificed Property (b) is less stringent and may be sacrificed

Module 6 23042023

Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done

during Insertion Modification Deletion

Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values

Generation of invalid and spurious data during joins on improperly related base relations

Module 6 24042023

Functional dependencies Functional dependencies (FDs)

Is a constraint between two sets of attributes from the database

Assumption The entire database is a single universal

relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes

Module 6 25042023

Definition

FDs are used to specify formal measures of the

goodness of relational designs keys that are used to define normal forms for

relations constraints that are derived from the meaning and

interrelationships of the data attributes A set of attributes X functionally determines

a set of attributes Y if the value of X determines a unique value for Y

Module 6 26042023

Functional Dependencies

A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If

t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r

depend on or are determined by the values of the X component

The values of the X component functionally determines the values of Y component

FDs are derived from the real-world constraints on the attributes

The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times

Module 6 27042023

Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760

Superior NA 31700 350

Victoria Africa 26828 250

Aral Sea Asia 24904 280

Huron NA 23000 206

Michigan NA 22300 307

Tanganyika Africa 12700 420

Continent -gtName Name -gtLength

Module 6 28042023

Graphical representation of Functional Dependencies

Module 6 29042023

Examples of FD constraints Social security number uniquely determines

employee name SSN -gt ENAME

Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION

Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS

Module 6 30042023

Examples of FD constraints A FD is a property of the attributes in the

schema R not of a particular legal relation state r of R

It must be defined explicitly by someone who knows the semantics of the attributes of R

The constraint must hold on every relation instance r(R)

If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with

t1[K]=t2[K])

Module 6 31042023

Satisfies algorithm

Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B

How it works Sort the tuples of the relation r on the A attributes so

that tuples with equal values under A are next to each other

Check that tuples with equal values under attributes A also have equal values under B

If it meets the condition 2 then the output of the algorithm is true else it is false

Module 6 32042023

Relation state of TEACH

TEACH

TEACHER COURSE TEXT

Teacher Course Text

Smith Data Structures

Bartram

Smith Data Management

Martin

Hall Compilers Hoffmann

Brown ooad Horowitz

TEACHER -gt COURSE

TEXT -gt COURSE

Module 6 33042023

Drawbacks of Satifies algorithm

Using this algorithm is tedious and time consuming

So inference axioms are used

Module 6 34042023

Inference Rules for Functional Dependencies

F is the set of functional dependencies that are specified on relation schema R

Schema designers specifies the most obvious FDs

The other dependencies can be inferred or deduced from FDs in F

Module 6 35042023

Example of Closure Department has one manager (DEPT_NO -gt

MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two

dependencies together imply that (DEPT_NO-gtMGR_PHONE)

This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F

The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)

Module 6 36042023

Example

F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are

SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME

To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y

Module 6 37042023

Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold

whenever the FDs in F hold Armstrongs inference rules

IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ

(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z

IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer

from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]

By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F

Module 6 38042023

Inference Rules for FDs Some additional inference rules that are useful

Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z

The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 3: Module 6

Module 6 3042023

Evaluating relation schemas

Two levels of relation schemasThe logical or conceptual view

How users interpret the relation schemas and the meaning of their attributes

Implementation or storage view How the tuples in the base relation are stored

and updated

Module 6 4042023

Informal Design Guidelines for Relational Databases Four informal measures of quality for

relation schema design are

1 Imparting clear semantics to attributes in Relations

2 Reducing the redundant values in tuples

3 Reducing the null values in tuples

4 Disallowing the possibility of generating spurious tuples

Module 6 5042023

1Semantics of the Relation Attributes GUIDELINE 1 Informally each tuple in a relation

should represent one entity or relationship instance (Applies to individual relations and their attributes) Attributes of different entities (EMPLOYEEs

DEPARTMENTs PROJECTs) should not be mixed in the same relation

Only foreign keys should be used to refer to other entities

Entity and relationship attributes should be kept apart as much as possible

Bottom Line Design a schema that can be explained easily relation by relation The semantics of attributes should be easy to interpret

Module 6 6042023

A Simplified COMPANY relational database schema

Module 6 7042023

Two relation schemas suffering from update anomalies

ENAME SSN BDATEADDRES

SDNUMBE

RDNAME

DMGRSSN

PLOCATION

SSNPNUMBE

RHOURS ENAME PNAME

EMP_PROJ

EMP_DEPT

Module 6 8042023

Two relation schemas suffering from update anomalies Although there is nothing wrong logically with

these 2 relations they are considered poor designs because they violate guideline 1 by mixing attributes from distinct real world entities

EMP_DEPT mixes attributes of employee and department and EMP_PROJ mixes attributes of employees amp projects and the WORKS_ON relationship

They may be used as views but they cause problems when used as base relations

Module 6 9042023

2Redundant Information in Tuples and Update Anomalies Goal of schema design is to minimize the

storage space used by the base relations Information is stored redundantly Wastes storage

Causes problems with update anomalies Insertion anomalies Deletion anomalies Modification anomalies

Module 6 10042023

Two relation schemas suffering from update anomalies

ENAME SSN BDATEADDRES

SDNUMBE

RDNAME

DMGRSSN

PLOCATION

SSNPNUMBE

RHOURS ENAME PNAME

EMP_DEPT

EMP_PROJ

Module 6 11042023

EXAMPLE OF AN INSERT ANOMALY Consider the relation

EMP_PROJ(Emp Proj Ename Pname No_hours)

Insert Anomaly Cannot insert a project unless an employee is

assigned to it Conversely

Cannot insert an employee unless an heshe is assigned to a project

Module 6 12042023

EXAMPLE OF AN DELETE ANOMALY Consider the relation

EMP_PROJ(Emp Proj Ename Pname No_hours)

Delete Anomaly When a project is deleted it will result in deleting

all the employees who work on that project Alternately if an employee is the sole employee

on a project deleting that employee would result in deleting the corresponding project

Module 6 13042023

EXAMPLE OF AN UPDATE ANOMALY Consider the relation

EMP_PROJ(Emp Proj Ename Pname No_hours)

Update AnomalyChanging the name of project number P1

from ldquoBillingrdquo to ldquoCustomer-Accountingrdquo may cause this update to be made for all 100 employees working on project P1

Module 6 14042023

Module 6 15042023

Guideline to Redundant Information in Tuples and Update Anomalies GUIDELINE 2

Design a schema that does not suffer from the insertion deletion and update anomalies

If there are any anomalies present then note them so that applications can be made to take them into account

In general it is advisable to use anomaly free base relations and to specify views that include the joins for placing together the attributes frequently referenced in important queries

Module 6 16042023

Problems with Nulls If many attributes are grouped together

as a fat relation it gives rise to many nulls in the tuples

Waste storage Problems in understanding the

meaning of the attributes Difficult while using Nulls in aggregate

operators like count or sum

Module 6 17042023

3 Null Values in Tuples Interpretations of nulls

Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist but unavailable

GUIDELINE 3 Relations should be designed such that their

tuples will have as few NULL values as possible Attributes that are NULL frequently could be

placed in separate relations (with the primary key) Example-

if only 10 of employees have individual offices it is better not to include office_number as an attribute in the employee relation

Better create a new relation emp_offices(essn office_number)

Module 6 18042023

Example of Spurious Tuples

Module 6 19042023

Generation of spurious tuples The two relations EMP_PROJ1 and EMP_LOCS as

the base relations of EMP_PROJ is not a good schema design

Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ

These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid

This is because the PLOCATION attribute which is used for joining is neither a primary key nor a foreign key in either EMP_LOCS AND EMP_PROJ1

Module 6 20042023

Example of Spurious Tuples contd

Module 6 21042023

4 Spurious Tuples Bad designs for a relational database may result

in erroneous results for certain JOIN operations The lossless join property is used to

guarantee meaningful results for join operations

GUIDELINE 4 Design relation schemas so that they can be

joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated

Module 6 22042023

Spurious Tuples

There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies

Note that Property (a) is extremely important and cannot be

sacrificed Property (b) is less stringent and may be sacrificed

Module 6 23042023

Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done

during Insertion Modification Deletion

Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values

Generation of invalid and spurious data during joins on improperly related base relations

Module 6 24042023

Functional dependencies Functional dependencies (FDs)

Is a constraint between two sets of attributes from the database

Assumption The entire database is a single universal

relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes

Module 6 25042023

Definition

FDs are used to specify formal measures of the

goodness of relational designs keys that are used to define normal forms for

relations constraints that are derived from the meaning and

interrelationships of the data attributes A set of attributes X functionally determines

a set of attributes Y if the value of X determines a unique value for Y

Module 6 26042023

Functional Dependencies

A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If

t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r

depend on or are determined by the values of the X component

The values of the X component functionally determines the values of Y component

FDs are derived from the real-world constraints on the attributes

The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times

Module 6 27042023

Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760

Superior NA 31700 350

Victoria Africa 26828 250

Aral Sea Asia 24904 280

Huron NA 23000 206

Michigan NA 22300 307

Tanganyika Africa 12700 420

Continent -gtName Name -gtLength

Module 6 28042023

Graphical representation of Functional Dependencies

Module 6 29042023

Examples of FD constraints Social security number uniquely determines

employee name SSN -gt ENAME

Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION

Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS

Module 6 30042023

Examples of FD constraints A FD is a property of the attributes in the

schema R not of a particular legal relation state r of R

It must be defined explicitly by someone who knows the semantics of the attributes of R

The constraint must hold on every relation instance r(R)

If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with

t1[K]=t2[K])

Module 6 31042023

Satisfies algorithm

Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B

How it works Sort the tuples of the relation r on the A attributes so

that tuples with equal values under A are next to each other

Check that tuples with equal values under attributes A also have equal values under B

If it meets the condition 2 then the output of the algorithm is true else it is false

Module 6 32042023

Relation state of TEACH

TEACH

TEACHER COURSE TEXT

Teacher Course Text

Smith Data Structures

Bartram

Smith Data Management

Martin

Hall Compilers Hoffmann

Brown ooad Horowitz

TEACHER -gt COURSE

TEXT -gt COURSE

Module 6 33042023

Drawbacks of Satifies algorithm

Using this algorithm is tedious and time consuming

So inference axioms are used

Module 6 34042023

Inference Rules for Functional Dependencies

F is the set of functional dependencies that are specified on relation schema R

Schema designers specifies the most obvious FDs

The other dependencies can be inferred or deduced from FDs in F

Module 6 35042023

Example of Closure Department has one manager (DEPT_NO -gt

MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two

dependencies together imply that (DEPT_NO-gtMGR_PHONE)

This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F

The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)

Module 6 36042023

Example

F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are

SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME

To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y

Module 6 37042023

Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold

whenever the FDs in F hold Armstrongs inference rules

IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ

(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z

IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer

from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]

By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F

Module 6 38042023

Inference Rules for FDs Some additional inference rules that are useful

Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z

The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 4: Module 6

Module 6 4042023

Informal Design Guidelines for Relational Databases Four informal measures of quality for

relation schema design are

1 Imparting clear semantics to attributes in Relations

2 Reducing the redundant values in tuples

3 Reducing the null values in tuples

4 Disallowing the possibility of generating spurious tuples

Module 6 5042023

1Semantics of the Relation Attributes GUIDELINE 1 Informally each tuple in a relation

should represent one entity or relationship instance (Applies to individual relations and their attributes) Attributes of different entities (EMPLOYEEs

DEPARTMENTs PROJECTs) should not be mixed in the same relation

Only foreign keys should be used to refer to other entities

Entity and relationship attributes should be kept apart as much as possible

Bottom Line Design a schema that can be explained easily relation by relation The semantics of attributes should be easy to interpret

Module 6 6042023

A Simplified COMPANY relational database schema

Module 6 7042023

Two relation schemas suffering from update anomalies

ENAME SSN BDATEADDRES

SDNUMBE

RDNAME

DMGRSSN

PLOCATION

SSNPNUMBE

RHOURS ENAME PNAME

EMP_PROJ

EMP_DEPT

Module 6 8042023

Two relation schemas suffering from update anomalies Although there is nothing wrong logically with

these 2 relations they are considered poor designs because they violate guideline 1 by mixing attributes from distinct real world entities

EMP_DEPT mixes attributes of employee and department and EMP_PROJ mixes attributes of employees amp projects and the WORKS_ON relationship

They may be used as views but they cause problems when used as base relations

Module 6 9042023

2Redundant Information in Tuples and Update Anomalies Goal of schema design is to minimize the

storage space used by the base relations Information is stored redundantly Wastes storage

Causes problems with update anomalies Insertion anomalies Deletion anomalies Modification anomalies

Module 6 10042023

Two relation schemas suffering from update anomalies

ENAME SSN BDATEADDRES

SDNUMBE

RDNAME

DMGRSSN

PLOCATION

SSNPNUMBE

RHOURS ENAME PNAME

EMP_DEPT

EMP_PROJ

Module 6 11042023

EXAMPLE OF AN INSERT ANOMALY Consider the relation

EMP_PROJ(Emp Proj Ename Pname No_hours)

Insert Anomaly Cannot insert a project unless an employee is

assigned to it Conversely

Cannot insert an employee unless an heshe is assigned to a project

Module 6 12042023

EXAMPLE OF AN DELETE ANOMALY Consider the relation

EMP_PROJ(Emp Proj Ename Pname No_hours)

Delete Anomaly When a project is deleted it will result in deleting

all the employees who work on that project Alternately if an employee is the sole employee

on a project deleting that employee would result in deleting the corresponding project

Module 6 13042023

EXAMPLE OF AN UPDATE ANOMALY Consider the relation

EMP_PROJ(Emp Proj Ename Pname No_hours)

Update AnomalyChanging the name of project number P1

from ldquoBillingrdquo to ldquoCustomer-Accountingrdquo may cause this update to be made for all 100 employees working on project P1

Module 6 14042023

Module 6 15042023

Guideline to Redundant Information in Tuples and Update Anomalies GUIDELINE 2

Design a schema that does not suffer from the insertion deletion and update anomalies

If there are any anomalies present then note them so that applications can be made to take them into account

In general it is advisable to use anomaly free base relations and to specify views that include the joins for placing together the attributes frequently referenced in important queries

Module 6 16042023

Problems with Nulls If many attributes are grouped together

as a fat relation it gives rise to many nulls in the tuples

Waste storage Problems in understanding the

meaning of the attributes Difficult while using Nulls in aggregate

operators like count or sum

Module 6 17042023

3 Null Values in Tuples Interpretations of nulls

Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist but unavailable

GUIDELINE 3 Relations should be designed such that their

tuples will have as few NULL values as possible Attributes that are NULL frequently could be

placed in separate relations (with the primary key) Example-

if only 10 of employees have individual offices it is better not to include office_number as an attribute in the employee relation

Better create a new relation emp_offices(essn office_number)

Module 6 18042023

Example of Spurious Tuples

Module 6 19042023

Generation of spurious tuples The two relations EMP_PROJ1 and EMP_LOCS as

the base relations of EMP_PROJ is not a good schema design

Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ

These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid

This is because the PLOCATION attribute which is used for joining is neither a primary key nor a foreign key in either EMP_LOCS AND EMP_PROJ1

Module 6 20042023

Example of Spurious Tuples contd

Module 6 21042023

4 Spurious Tuples Bad designs for a relational database may result

in erroneous results for certain JOIN operations The lossless join property is used to

guarantee meaningful results for join operations

GUIDELINE 4 Design relation schemas so that they can be

joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated

Module 6 22042023

Spurious Tuples

There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies

Note that Property (a) is extremely important and cannot be

sacrificed Property (b) is less stringent and may be sacrificed

Module 6 23042023

Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done

during Insertion Modification Deletion

Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values

Generation of invalid and spurious data during joins on improperly related base relations

Module 6 24042023

Functional dependencies Functional dependencies (FDs)

Is a constraint between two sets of attributes from the database

Assumption The entire database is a single universal

relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes

Module 6 25042023

Definition

FDs are used to specify formal measures of the

goodness of relational designs keys that are used to define normal forms for

relations constraints that are derived from the meaning and

interrelationships of the data attributes A set of attributes X functionally determines

a set of attributes Y if the value of X determines a unique value for Y

Module 6 26042023

Functional Dependencies

A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If

t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r

depend on or are determined by the values of the X component

The values of the X component functionally determines the values of Y component

FDs are derived from the real-world constraints on the attributes

The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times

Module 6 27042023

Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760

Superior NA 31700 350

Victoria Africa 26828 250

Aral Sea Asia 24904 280

Huron NA 23000 206

Michigan NA 22300 307

Tanganyika Africa 12700 420

Continent -gtName Name -gtLength

Module 6 28042023

Graphical representation of Functional Dependencies

Module 6 29042023

Examples of FD constraints Social security number uniquely determines

employee name SSN -gt ENAME

Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION

Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS

Module 6 30042023

Examples of FD constraints A FD is a property of the attributes in the

schema R not of a particular legal relation state r of R

It must be defined explicitly by someone who knows the semantics of the attributes of R

The constraint must hold on every relation instance r(R)

If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with

t1[K]=t2[K])

Module 6 31042023

Satisfies algorithm

Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B

How it works Sort the tuples of the relation r on the A attributes so

that tuples with equal values under A are next to each other

Check that tuples with equal values under attributes A also have equal values under B

If it meets the condition 2 then the output of the algorithm is true else it is false

Module 6 32042023

Relation state of TEACH

TEACH

TEACHER COURSE TEXT

Teacher Course Text

Smith Data Structures

Bartram

Smith Data Management

Martin

Hall Compilers Hoffmann

Brown ooad Horowitz

TEACHER -gt COURSE

TEXT -gt COURSE

Module 6 33042023

Drawbacks of Satifies algorithm

Using this algorithm is tedious and time consuming

So inference axioms are used

Module 6 34042023

Inference Rules for Functional Dependencies

F is the set of functional dependencies that are specified on relation schema R

Schema designers specifies the most obvious FDs

The other dependencies can be inferred or deduced from FDs in F

Module 6 35042023

Example of Closure Department has one manager (DEPT_NO -gt

MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two

dependencies together imply that (DEPT_NO-gtMGR_PHONE)

This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F

The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)

Module 6 36042023

Example

F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are

SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME

To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y

Module 6 37042023

Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold

whenever the FDs in F hold Armstrongs inference rules

IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ

(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z

IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer

from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]

By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F

Module 6 38042023

Inference Rules for FDs Some additional inference rules that are useful

Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z

The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 5: Module 6

Module 6 5042023

1Semantics of the Relation Attributes GUIDELINE 1 Informally each tuple in a relation

should represent one entity or relationship instance (Applies to individual relations and their attributes) Attributes of different entities (EMPLOYEEs

DEPARTMENTs PROJECTs) should not be mixed in the same relation

Only foreign keys should be used to refer to other entities

Entity and relationship attributes should be kept apart as much as possible

Bottom Line Design a schema that can be explained easily relation by relation The semantics of attributes should be easy to interpret

Module 6 6042023

A Simplified COMPANY relational database schema

Module 6 7042023

Two relation schemas suffering from update anomalies

ENAME SSN BDATEADDRES

SDNUMBE

RDNAME

DMGRSSN

PLOCATION

SSNPNUMBE

RHOURS ENAME PNAME

EMP_PROJ

EMP_DEPT

Module 6 8042023

Two relation schemas suffering from update anomalies Although there is nothing wrong logically with

these 2 relations they are considered poor designs because they violate guideline 1 by mixing attributes from distinct real world entities

EMP_DEPT mixes attributes of employee and department and EMP_PROJ mixes attributes of employees amp projects and the WORKS_ON relationship

They may be used as views but they cause problems when used as base relations

Module 6 9042023

2Redundant Information in Tuples and Update Anomalies Goal of schema design is to minimize the

storage space used by the base relations Information is stored redundantly Wastes storage

Causes problems with update anomalies Insertion anomalies Deletion anomalies Modification anomalies

Module 6 10042023

Two relation schemas suffering from update anomalies

ENAME SSN BDATEADDRES

SDNUMBE

RDNAME

DMGRSSN

PLOCATION

SSNPNUMBE

RHOURS ENAME PNAME

EMP_DEPT

EMP_PROJ

Module 6 11042023

EXAMPLE OF AN INSERT ANOMALY Consider the relation

EMP_PROJ(Emp Proj Ename Pname No_hours)

Insert Anomaly Cannot insert a project unless an employee is

assigned to it Conversely

Cannot insert an employee unless an heshe is assigned to a project

Module 6 12042023

EXAMPLE OF AN DELETE ANOMALY Consider the relation

EMP_PROJ(Emp Proj Ename Pname No_hours)

Delete Anomaly When a project is deleted it will result in deleting

all the employees who work on that project Alternately if an employee is the sole employee

on a project deleting that employee would result in deleting the corresponding project

Module 6 13042023

EXAMPLE OF AN UPDATE ANOMALY Consider the relation

EMP_PROJ(Emp Proj Ename Pname No_hours)

Update AnomalyChanging the name of project number P1

from ldquoBillingrdquo to ldquoCustomer-Accountingrdquo may cause this update to be made for all 100 employees working on project P1

Module 6 14042023

Module 6 15042023

Guideline to Redundant Information in Tuples and Update Anomalies GUIDELINE 2

Design a schema that does not suffer from the insertion deletion and update anomalies

If there are any anomalies present then note them so that applications can be made to take them into account

In general it is advisable to use anomaly free base relations and to specify views that include the joins for placing together the attributes frequently referenced in important queries

Module 6 16042023

Problems with Nulls If many attributes are grouped together

as a fat relation it gives rise to many nulls in the tuples

Waste storage Problems in understanding the

meaning of the attributes Difficult while using Nulls in aggregate

operators like count or sum

Module 6 17042023

3 Null Values in Tuples Interpretations of nulls

Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist but unavailable

GUIDELINE 3 Relations should be designed such that their

tuples will have as few NULL values as possible Attributes that are NULL frequently could be

placed in separate relations (with the primary key) Example-

if only 10 of employees have individual offices it is better not to include office_number as an attribute in the employee relation

Better create a new relation emp_offices(essn office_number)

Module 6 18042023

Example of Spurious Tuples

Module 6 19042023

Generation of spurious tuples The two relations EMP_PROJ1 and EMP_LOCS as

the base relations of EMP_PROJ is not a good schema design

Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ

These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid

This is because the PLOCATION attribute which is used for joining is neither a primary key nor a foreign key in either EMP_LOCS AND EMP_PROJ1

Module 6 20042023

Example of Spurious Tuples contd

Module 6 21042023

4 Spurious Tuples Bad designs for a relational database may result

in erroneous results for certain JOIN operations The lossless join property is used to

guarantee meaningful results for join operations

GUIDELINE 4 Design relation schemas so that they can be

joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated

Module 6 22042023

Spurious Tuples

There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies

Note that Property (a) is extremely important and cannot be

sacrificed Property (b) is less stringent and may be sacrificed

Module 6 23042023

Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done

during Insertion Modification Deletion

Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values

Generation of invalid and spurious data during joins on improperly related base relations

Module 6 24042023

Functional dependencies Functional dependencies (FDs)

Is a constraint between two sets of attributes from the database

Assumption The entire database is a single universal

relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes

Module 6 25042023

Definition

FDs are used to specify formal measures of the

goodness of relational designs keys that are used to define normal forms for

relations constraints that are derived from the meaning and

interrelationships of the data attributes A set of attributes X functionally determines

a set of attributes Y if the value of X determines a unique value for Y

Module 6 26042023

Functional Dependencies

A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If

t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r

depend on or are determined by the values of the X component

The values of the X component functionally determines the values of Y component

FDs are derived from the real-world constraints on the attributes

The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times

Module 6 27042023

Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760

Superior NA 31700 350

Victoria Africa 26828 250

Aral Sea Asia 24904 280

Huron NA 23000 206

Michigan NA 22300 307

Tanganyika Africa 12700 420

Continent -gtName Name -gtLength

Module 6 28042023

Graphical representation of Functional Dependencies

Module 6 29042023

Examples of FD constraints Social security number uniquely determines

employee name SSN -gt ENAME

Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION

Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS

Module 6 30042023

Examples of FD constraints A FD is a property of the attributes in the

schema R not of a particular legal relation state r of R

It must be defined explicitly by someone who knows the semantics of the attributes of R

The constraint must hold on every relation instance r(R)

If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with

t1[K]=t2[K])

Module 6 31042023

Satisfies algorithm

Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B

How it works Sort the tuples of the relation r on the A attributes so

that tuples with equal values under A are next to each other

Check that tuples with equal values under attributes A also have equal values under B

If it meets the condition 2 then the output of the algorithm is true else it is false

Module 6 32042023

Relation state of TEACH

TEACH

TEACHER COURSE TEXT

Teacher Course Text

Smith Data Structures

Bartram

Smith Data Management

Martin

Hall Compilers Hoffmann

Brown ooad Horowitz

TEACHER -gt COURSE

TEXT -gt COURSE

Module 6 33042023

Drawbacks of Satifies algorithm

Using this algorithm is tedious and time consuming

So inference axioms are used

Module 6 34042023

Inference Rules for Functional Dependencies

F is the set of functional dependencies that are specified on relation schema R

Schema designers specifies the most obvious FDs

The other dependencies can be inferred or deduced from FDs in F

Module 6 35042023

Example of Closure Department has one manager (DEPT_NO -gt

MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two

dependencies together imply that (DEPT_NO-gtMGR_PHONE)

This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F

The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)

Module 6 36042023

Example

F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are

SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME

To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y

Module 6 37042023

Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold

whenever the FDs in F hold Armstrongs inference rules

IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ

(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z

IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer

from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]

By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F

Module 6 38042023

Inference Rules for FDs Some additional inference rules that are useful

Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z

The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 6: Module 6

Module 6 6042023

A Simplified COMPANY relational database schema

Module 6 7042023

Two relation schemas suffering from update anomalies

ENAME SSN BDATEADDRES

SDNUMBE

RDNAME

DMGRSSN

PLOCATION

SSNPNUMBE

RHOURS ENAME PNAME

EMP_PROJ

EMP_DEPT

Module 6 8042023

Two relation schemas suffering from update anomalies Although there is nothing wrong logically with

these 2 relations they are considered poor designs because they violate guideline 1 by mixing attributes from distinct real world entities

EMP_DEPT mixes attributes of employee and department and EMP_PROJ mixes attributes of employees amp projects and the WORKS_ON relationship

They may be used as views but they cause problems when used as base relations

Module 6 9042023

2Redundant Information in Tuples and Update Anomalies Goal of schema design is to minimize the

storage space used by the base relations Information is stored redundantly Wastes storage

Causes problems with update anomalies Insertion anomalies Deletion anomalies Modification anomalies

Module 6 10042023

Two relation schemas suffering from update anomalies

ENAME SSN BDATEADDRES

SDNUMBE

RDNAME

DMGRSSN

PLOCATION

SSNPNUMBE

RHOURS ENAME PNAME

EMP_DEPT

EMP_PROJ

Module 6 11042023

EXAMPLE OF AN INSERT ANOMALY Consider the relation

EMP_PROJ(Emp Proj Ename Pname No_hours)

Insert Anomaly Cannot insert a project unless an employee is

assigned to it Conversely

Cannot insert an employee unless an heshe is assigned to a project

Module 6 12042023

EXAMPLE OF AN DELETE ANOMALY Consider the relation

EMP_PROJ(Emp Proj Ename Pname No_hours)

Delete Anomaly When a project is deleted it will result in deleting

all the employees who work on that project Alternately if an employee is the sole employee

on a project deleting that employee would result in deleting the corresponding project

Module 6 13042023

EXAMPLE OF AN UPDATE ANOMALY Consider the relation

EMP_PROJ(Emp Proj Ename Pname No_hours)

Update AnomalyChanging the name of project number P1

from ldquoBillingrdquo to ldquoCustomer-Accountingrdquo may cause this update to be made for all 100 employees working on project P1

Module 6 14042023

Module 6 15042023

Guideline to Redundant Information in Tuples and Update Anomalies GUIDELINE 2

Design a schema that does not suffer from the insertion deletion and update anomalies

If there are any anomalies present then note them so that applications can be made to take them into account

In general it is advisable to use anomaly free base relations and to specify views that include the joins for placing together the attributes frequently referenced in important queries

Module 6 16042023

Problems with Nulls If many attributes are grouped together

as a fat relation it gives rise to many nulls in the tuples

Waste storage Problems in understanding the

meaning of the attributes Difficult while using Nulls in aggregate

operators like count or sum

Module 6 17042023

3 Null Values in Tuples Interpretations of nulls

Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist but unavailable

GUIDELINE 3 Relations should be designed such that their

tuples will have as few NULL values as possible Attributes that are NULL frequently could be

placed in separate relations (with the primary key) Example-

if only 10 of employees have individual offices it is better not to include office_number as an attribute in the employee relation

Better create a new relation emp_offices(essn office_number)

Module 6 18042023

Example of Spurious Tuples

Module 6 19042023

Generation of spurious tuples The two relations EMP_PROJ1 and EMP_LOCS as

the base relations of EMP_PROJ is not a good schema design

Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ

These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid

This is because the PLOCATION attribute which is used for joining is neither a primary key nor a foreign key in either EMP_LOCS AND EMP_PROJ1

Module 6 20042023

Example of Spurious Tuples contd

Module 6 21042023

4 Spurious Tuples Bad designs for a relational database may result

in erroneous results for certain JOIN operations The lossless join property is used to

guarantee meaningful results for join operations

GUIDELINE 4 Design relation schemas so that they can be

joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated

Module 6 22042023

Spurious Tuples

There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies

Note that Property (a) is extremely important and cannot be

sacrificed Property (b) is less stringent and may be sacrificed

Module 6 23042023

Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done

during Insertion Modification Deletion

Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values

Generation of invalid and spurious data during joins on improperly related base relations

Module 6 24042023

Functional dependencies Functional dependencies (FDs)

Is a constraint between two sets of attributes from the database

Assumption The entire database is a single universal

relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes

Module 6 25042023

Definition

FDs are used to specify formal measures of the

goodness of relational designs keys that are used to define normal forms for

relations constraints that are derived from the meaning and

interrelationships of the data attributes A set of attributes X functionally determines

a set of attributes Y if the value of X determines a unique value for Y

Module 6 26042023

Functional Dependencies

A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If

t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r

depend on or are determined by the values of the X component

The values of the X component functionally determines the values of Y component

FDs are derived from the real-world constraints on the attributes

The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times

Module 6 27042023

Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760

Superior NA 31700 350

Victoria Africa 26828 250

Aral Sea Asia 24904 280

Huron NA 23000 206

Michigan NA 22300 307

Tanganyika Africa 12700 420

Continent -gtName Name -gtLength

Module 6 28042023

Graphical representation of Functional Dependencies

Module 6 29042023

Examples of FD constraints Social security number uniquely determines

employee name SSN -gt ENAME

Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION

Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS

Module 6 30042023

Examples of FD constraints A FD is a property of the attributes in the

schema R not of a particular legal relation state r of R

It must be defined explicitly by someone who knows the semantics of the attributes of R

The constraint must hold on every relation instance r(R)

If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with

t1[K]=t2[K])

Module 6 31042023

Satisfies algorithm

Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B

How it works Sort the tuples of the relation r on the A attributes so

that tuples with equal values under A are next to each other

Check that tuples with equal values under attributes A also have equal values under B

If it meets the condition 2 then the output of the algorithm is true else it is false

Module 6 32042023

Relation state of TEACH

TEACH

TEACHER COURSE TEXT

Teacher Course Text

Smith Data Structures

Bartram

Smith Data Management

Martin

Hall Compilers Hoffmann

Brown ooad Horowitz

TEACHER -gt COURSE

TEXT -gt COURSE

Module 6 33042023

Drawbacks of Satifies algorithm

Using this algorithm is tedious and time consuming

So inference axioms are used

Module 6 34042023

Inference Rules for Functional Dependencies

F is the set of functional dependencies that are specified on relation schema R

Schema designers specifies the most obvious FDs

The other dependencies can be inferred or deduced from FDs in F

Module 6 35042023

Example of Closure Department has one manager (DEPT_NO -gt

MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two

dependencies together imply that (DEPT_NO-gtMGR_PHONE)

This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F

The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)

Module 6 36042023

Example

F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are

SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME

To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y

Module 6 37042023

Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold

whenever the FDs in F hold Armstrongs inference rules

IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ

(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z

IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer

from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]

By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F

Module 6 38042023

Inference Rules for FDs Some additional inference rules that are useful

Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z

The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 7: Module 6

Module 6 7042023

Two relation schemas suffering from update anomalies

ENAME SSN BDATEADDRES

SDNUMBE

RDNAME

DMGRSSN

PLOCATION

SSNPNUMBE

RHOURS ENAME PNAME

EMP_PROJ

EMP_DEPT

Module 6 8042023

Two relation schemas suffering from update anomalies Although there is nothing wrong logically with

these 2 relations they are considered poor designs because they violate guideline 1 by mixing attributes from distinct real world entities

EMP_DEPT mixes attributes of employee and department and EMP_PROJ mixes attributes of employees amp projects and the WORKS_ON relationship

They may be used as views but they cause problems when used as base relations

Module 6 9042023

2Redundant Information in Tuples and Update Anomalies Goal of schema design is to minimize the

storage space used by the base relations Information is stored redundantly Wastes storage

Causes problems with update anomalies Insertion anomalies Deletion anomalies Modification anomalies

Module 6 10042023

Two relation schemas suffering from update anomalies

ENAME SSN BDATEADDRES

SDNUMBE

RDNAME

DMGRSSN

PLOCATION

SSNPNUMBE

RHOURS ENAME PNAME

EMP_DEPT

EMP_PROJ

Module 6 11042023

EXAMPLE OF AN INSERT ANOMALY Consider the relation

EMP_PROJ(Emp Proj Ename Pname No_hours)

Insert Anomaly Cannot insert a project unless an employee is

assigned to it Conversely

Cannot insert an employee unless an heshe is assigned to a project

Module 6 12042023

EXAMPLE OF AN DELETE ANOMALY Consider the relation

EMP_PROJ(Emp Proj Ename Pname No_hours)

Delete Anomaly When a project is deleted it will result in deleting

all the employees who work on that project Alternately if an employee is the sole employee

on a project deleting that employee would result in deleting the corresponding project

Module 6 13042023

EXAMPLE OF AN UPDATE ANOMALY Consider the relation

EMP_PROJ(Emp Proj Ename Pname No_hours)

Update AnomalyChanging the name of project number P1

from ldquoBillingrdquo to ldquoCustomer-Accountingrdquo may cause this update to be made for all 100 employees working on project P1

Module 6 14042023

Module 6 15042023

Guideline to Redundant Information in Tuples and Update Anomalies GUIDELINE 2

Design a schema that does not suffer from the insertion deletion and update anomalies

If there are any anomalies present then note them so that applications can be made to take them into account

In general it is advisable to use anomaly free base relations and to specify views that include the joins for placing together the attributes frequently referenced in important queries

Module 6 16042023

Problems with Nulls If many attributes are grouped together

as a fat relation it gives rise to many nulls in the tuples

Waste storage Problems in understanding the

meaning of the attributes Difficult while using Nulls in aggregate

operators like count or sum

Module 6 17042023

3 Null Values in Tuples Interpretations of nulls

Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist but unavailable

GUIDELINE 3 Relations should be designed such that their

tuples will have as few NULL values as possible Attributes that are NULL frequently could be

placed in separate relations (with the primary key) Example-

if only 10 of employees have individual offices it is better not to include office_number as an attribute in the employee relation

Better create a new relation emp_offices(essn office_number)

Module 6 18042023

Example of Spurious Tuples

Module 6 19042023

Generation of spurious tuples The two relations EMP_PROJ1 and EMP_LOCS as

the base relations of EMP_PROJ is not a good schema design

Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ

These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid

This is because the PLOCATION attribute which is used for joining is neither a primary key nor a foreign key in either EMP_LOCS AND EMP_PROJ1

Module 6 20042023

Example of Spurious Tuples contd

Module 6 21042023

4 Spurious Tuples Bad designs for a relational database may result

in erroneous results for certain JOIN operations The lossless join property is used to

guarantee meaningful results for join operations

GUIDELINE 4 Design relation schemas so that they can be

joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated

Module 6 22042023

Spurious Tuples

There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies

Note that Property (a) is extremely important and cannot be

sacrificed Property (b) is less stringent and may be sacrificed

Module 6 23042023

Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done

during Insertion Modification Deletion

Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values

Generation of invalid and spurious data during joins on improperly related base relations

Module 6 24042023

Functional dependencies Functional dependencies (FDs)

Is a constraint between two sets of attributes from the database

Assumption The entire database is a single universal

relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes

Module 6 25042023

Definition

FDs are used to specify formal measures of the

goodness of relational designs keys that are used to define normal forms for

relations constraints that are derived from the meaning and

interrelationships of the data attributes A set of attributes X functionally determines

a set of attributes Y if the value of X determines a unique value for Y

Module 6 26042023

Functional Dependencies

A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If

t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r

depend on or are determined by the values of the X component

The values of the X component functionally determines the values of Y component

FDs are derived from the real-world constraints on the attributes

The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times

Module 6 27042023

Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760

Superior NA 31700 350

Victoria Africa 26828 250

Aral Sea Asia 24904 280

Huron NA 23000 206

Michigan NA 22300 307

Tanganyika Africa 12700 420

Continent -gtName Name -gtLength

Module 6 28042023

Graphical representation of Functional Dependencies

Module 6 29042023

Examples of FD constraints Social security number uniquely determines

employee name SSN -gt ENAME

Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION

Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS

Module 6 30042023

Examples of FD constraints A FD is a property of the attributes in the

schema R not of a particular legal relation state r of R

It must be defined explicitly by someone who knows the semantics of the attributes of R

The constraint must hold on every relation instance r(R)

If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with

t1[K]=t2[K])

Module 6 31042023

Satisfies algorithm

Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B

How it works Sort the tuples of the relation r on the A attributes so

that tuples with equal values under A are next to each other

Check that tuples with equal values under attributes A also have equal values under B

If it meets the condition 2 then the output of the algorithm is true else it is false

Module 6 32042023

Relation state of TEACH

TEACH

TEACHER COURSE TEXT

Teacher Course Text

Smith Data Structures

Bartram

Smith Data Management

Martin

Hall Compilers Hoffmann

Brown ooad Horowitz

TEACHER -gt COURSE

TEXT -gt COURSE

Module 6 33042023

Drawbacks of Satifies algorithm

Using this algorithm is tedious and time consuming

So inference axioms are used

Module 6 34042023

Inference Rules for Functional Dependencies

F is the set of functional dependencies that are specified on relation schema R

Schema designers specifies the most obvious FDs

The other dependencies can be inferred or deduced from FDs in F

Module 6 35042023

Example of Closure Department has one manager (DEPT_NO -gt

MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two

dependencies together imply that (DEPT_NO-gtMGR_PHONE)

This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F

The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)

Module 6 36042023

Example

F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are

SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME

To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y

Module 6 37042023

Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold

whenever the FDs in F hold Armstrongs inference rules

IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ

(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z

IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer

from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]

By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F

Module 6 38042023

Inference Rules for FDs Some additional inference rules that are useful

Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z

The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 8: Module 6

Module 6 8042023

Two relation schemas suffering from update anomalies Although there is nothing wrong logically with

these 2 relations they are considered poor designs because they violate guideline 1 by mixing attributes from distinct real world entities

EMP_DEPT mixes attributes of employee and department and EMP_PROJ mixes attributes of employees amp projects and the WORKS_ON relationship

They may be used as views but they cause problems when used as base relations

Module 6 9042023

2Redundant Information in Tuples and Update Anomalies Goal of schema design is to minimize the

storage space used by the base relations Information is stored redundantly Wastes storage

Causes problems with update anomalies Insertion anomalies Deletion anomalies Modification anomalies

Module 6 10042023

Two relation schemas suffering from update anomalies

ENAME SSN BDATEADDRES

SDNUMBE

RDNAME

DMGRSSN

PLOCATION

SSNPNUMBE

RHOURS ENAME PNAME

EMP_DEPT

EMP_PROJ

Module 6 11042023

EXAMPLE OF AN INSERT ANOMALY Consider the relation

EMP_PROJ(Emp Proj Ename Pname No_hours)

Insert Anomaly Cannot insert a project unless an employee is

assigned to it Conversely

Cannot insert an employee unless an heshe is assigned to a project

Module 6 12042023

EXAMPLE OF AN DELETE ANOMALY Consider the relation

EMP_PROJ(Emp Proj Ename Pname No_hours)

Delete Anomaly When a project is deleted it will result in deleting

all the employees who work on that project Alternately if an employee is the sole employee

on a project deleting that employee would result in deleting the corresponding project

Module 6 13042023

EXAMPLE OF AN UPDATE ANOMALY Consider the relation

EMP_PROJ(Emp Proj Ename Pname No_hours)

Update AnomalyChanging the name of project number P1

from ldquoBillingrdquo to ldquoCustomer-Accountingrdquo may cause this update to be made for all 100 employees working on project P1

Module 6 14042023

Module 6 15042023

Guideline to Redundant Information in Tuples and Update Anomalies GUIDELINE 2

Design a schema that does not suffer from the insertion deletion and update anomalies

If there are any anomalies present then note them so that applications can be made to take them into account

In general it is advisable to use anomaly free base relations and to specify views that include the joins for placing together the attributes frequently referenced in important queries

Module 6 16042023

Problems with Nulls If many attributes are grouped together

as a fat relation it gives rise to many nulls in the tuples

Waste storage Problems in understanding the

meaning of the attributes Difficult while using Nulls in aggregate

operators like count or sum

Module 6 17042023

3 Null Values in Tuples Interpretations of nulls

Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist but unavailable

GUIDELINE 3 Relations should be designed such that their

tuples will have as few NULL values as possible Attributes that are NULL frequently could be

placed in separate relations (with the primary key) Example-

if only 10 of employees have individual offices it is better not to include office_number as an attribute in the employee relation

Better create a new relation emp_offices(essn office_number)

Module 6 18042023

Example of Spurious Tuples

Module 6 19042023

Generation of spurious tuples The two relations EMP_PROJ1 and EMP_LOCS as

the base relations of EMP_PROJ is not a good schema design

Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ

These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid

This is because the PLOCATION attribute which is used for joining is neither a primary key nor a foreign key in either EMP_LOCS AND EMP_PROJ1

Module 6 20042023

Example of Spurious Tuples contd

Module 6 21042023

4 Spurious Tuples Bad designs for a relational database may result

in erroneous results for certain JOIN operations The lossless join property is used to

guarantee meaningful results for join operations

GUIDELINE 4 Design relation schemas so that they can be

joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated

Module 6 22042023

Spurious Tuples

There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies

Note that Property (a) is extremely important and cannot be

sacrificed Property (b) is less stringent and may be sacrificed

Module 6 23042023

Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done

during Insertion Modification Deletion

Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values

Generation of invalid and spurious data during joins on improperly related base relations

Module 6 24042023

Functional dependencies Functional dependencies (FDs)

Is a constraint between two sets of attributes from the database

Assumption The entire database is a single universal

relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes

Module 6 25042023

Definition

FDs are used to specify formal measures of the

goodness of relational designs keys that are used to define normal forms for

relations constraints that are derived from the meaning and

interrelationships of the data attributes A set of attributes X functionally determines

a set of attributes Y if the value of X determines a unique value for Y

Module 6 26042023

Functional Dependencies

A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If

t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r

depend on or are determined by the values of the X component

The values of the X component functionally determines the values of Y component

FDs are derived from the real-world constraints on the attributes

The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times

Module 6 27042023

Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760

Superior NA 31700 350

Victoria Africa 26828 250

Aral Sea Asia 24904 280

Huron NA 23000 206

Michigan NA 22300 307

Tanganyika Africa 12700 420

Continent -gtName Name -gtLength

Module 6 28042023

Graphical representation of Functional Dependencies

Module 6 29042023

Examples of FD constraints Social security number uniquely determines

employee name SSN -gt ENAME

Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION

Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS

Module 6 30042023

Examples of FD constraints A FD is a property of the attributes in the

schema R not of a particular legal relation state r of R

It must be defined explicitly by someone who knows the semantics of the attributes of R

The constraint must hold on every relation instance r(R)

If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with

t1[K]=t2[K])

Module 6 31042023

Satisfies algorithm

Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B

How it works Sort the tuples of the relation r on the A attributes so

that tuples with equal values under A are next to each other

Check that tuples with equal values under attributes A also have equal values under B

If it meets the condition 2 then the output of the algorithm is true else it is false

Module 6 32042023

Relation state of TEACH

TEACH

TEACHER COURSE TEXT

Teacher Course Text

Smith Data Structures

Bartram

Smith Data Management

Martin

Hall Compilers Hoffmann

Brown ooad Horowitz

TEACHER -gt COURSE

TEXT -gt COURSE

Module 6 33042023

Drawbacks of Satifies algorithm

Using this algorithm is tedious and time consuming

So inference axioms are used

Module 6 34042023

Inference Rules for Functional Dependencies

F is the set of functional dependencies that are specified on relation schema R

Schema designers specifies the most obvious FDs

The other dependencies can be inferred or deduced from FDs in F

Module 6 35042023

Example of Closure Department has one manager (DEPT_NO -gt

MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two

dependencies together imply that (DEPT_NO-gtMGR_PHONE)

This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F

The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)

Module 6 36042023

Example

F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are

SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME

To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y

Module 6 37042023

Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold

whenever the FDs in F hold Armstrongs inference rules

IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ

(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z

IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer

from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]

By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F

Module 6 38042023

Inference Rules for FDs Some additional inference rules that are useful

Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z

The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 9: Module 6

Module 6 9042023

2Redundant Information in Tuples and Update Anomalies Goal of schema design is to minimize the

storage space used by the base relations Information is stored redundantly Wastes storage

Causes problems with update anomalies Insertion anomalies Deletion anomalies Modification anomalies

Module 6 10042023

Two relation schemas suffering from update anomalies

ENAME SSN BDATEADDRES

SDNUMBE

RDNAME

DMGRSSN

PLOCATION

SSNPNUMBE

RHOURS ENAME PNAME

EMP_DEPT

EMP_PROJ

Module 6 11042023

EXAMPLE OF AN INSERT ANOMALY Consider the relation

EMP_PROJ(Emp Proj Ename Pname No_hours)

Insert Anomaly Cannot insert a project unless an employee is

assigned to it Conversely

Cannot insert an employee unless an heshe is assigned to a project

Module 6 12042023

EXAMPLE OF AN DELETE ANOMALY Consider the relation

EMP_PROJ(Emp Proj Ename Pname No_hours)

Delete Anomaly When a project is deleted it will result in deleting

all the employees who work on that project Alternately if an employee is the sole employee

on a project deleting that employee would result in deleting the corresponding project

Module 6 13042023

EXAMPLE OF AN UPDATE ANOMALY Consider the relation

EMP_PROJ(Emp Proj Ename Pname No_hours)

Update AnomalyChanging the name of project number P1

from ldquoBillingrdquo to ldquoCustomer-Accountingrdquo may cause this update to be made for all 100 employees working on project P1

Module 6 14042023

Module 6 15042023

Guideline to Redundant Information in Tuples and Update Anomalies GUIDELINE 2

Design a schema that does not suffer from the insertion deletion and update anomalies

If there are any anomalies present then note them so that applications can be made to take them into account

In general it is advisable to use anomaly free base relations and to specify views that include the joins for placing together the attributes frequently referenced in important queries

Module 6 16042023

Problems with Nulls If many attributes are grouped together

as a fat relation it gives rise to many nulls in the tuples

Waste storage Problems in understanding the

meaning of the attributes Difficult while using Nulls in aggregate

operators like count or sum

Module 6 17042023

3 Null Values in Tuples Interpretations of nulls

Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist but unavailable

GUIDELINE 3 Relations should be designed such that their

tuples will have as few NULL values as possible Attributes that are NULL frequently could be

placed in separate relations (with the primary key) Example-

if only 10 of employees have individual offices it is better not to include office_number as an attribute in the employee relation

Better create a new relation emp_offices(essn office_number)

Module 6 18042023

Example of Spurious Tuples

Module 6 19042023

Generation of spurious tuples The two relations EMP_PROJ1 and EMP_LOCS as

the base relations of EMP_PROJ is not a good schema design

Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ

These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid

This is because the PLOCATION attribute which is used for joining is neither a primary key nor a foreign key in either EMP_LOCS AND EMP_PROJ1

Module 6 20042023

Example of Spurious Tuples contd

Module 6 21042023

4 Spurious Tuples Bad designs for a relational database may result

in erroneous results for certain JOIN operations The lossless join property is used to

guarantee meaningful results for join operations

GUIDELINE 4 Design relation schemas so that they can be

joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated

Module 6 22042023

Spurious Tuples

There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies

Note that Property (a) is extremely important and cannot be

sacrificed Property (b) is less stringent and may be sacrificed

Module 6 23042023

Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done

during Insertion Modification Deletion

Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values

Generation of invalid and spurious data during joins on improperly related base relations

Module 6 24042023

Functional dependencies Functional dependencies (FDs)

Is a constraint between two sets of attributes from the database

Assumption The entire database is a single universal

relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes

Module 6 25042023

Definition

FDs are used to specify formal measures of the

goodness of relational designs keys that are used to define normal forms for

relations constraints that are derived from the meaning and

interrelationships of the data attributes A set of attributes X functionally determines

a set of attributes Y if the value of X determines a unique value for Y

Module 6 26042023

Functional Dependencies

A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If

t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r

depend on or are determined by the values of the X component

The values of the X component functionally determines the values of Y component

FDs are derived from the real-world constraints on the attributes

The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times

Module 6 27042023

Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760

Superior NA 31700 350

Victoria Africa 26828 250

Aral Sea Asia 24904 280

Huron NA 23000 206

Michigan NA 22300 307

Tanganyika Africa 12700 420

Continent -gtName Name -gtLength

Module 6 28042023

Graphical representation of Functional Dependencies

Module 6 29042023

Examples of FD constraints Social security number uniquely determines

employee name SSN -gt ENAME

Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION

Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS

Module 6 30042023

Examples of FD constraints A FD is a property of the attributes in the

schema R not of a particular legal relation state r of R

It must be defined explicitly by someone who knows the semantics of the attributes of R

The constraint must hold on every relation instance r(R)

If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with

t1[K]=t2[K])

Module 6 31042023

Satisfies algorithm

Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B

How it works Sort the tuples of the relation r on the A attributes so

that tuples with equal values under A are next to each other

Check that tuples with equal values under attributes A also have equal values under B

If it meets the condition 2 then the output of the algorithm is true else it is false

Module 6 32042023

Relation state of TEACH

TEACH

TEACHER COURSE TEXT

Teacher Course Text

Smith Data Structures

Bartram

Smith Data Management

Martin

Hall Compilers Hoffmann

Brown ooad Horowitz

TEACHER -gt COURSE

TEXT -gt COURSE

Module 6 33042023

Drawbacks of Satifies algorithm

Using this algorithm is tedious and time consuming

So inference axioms are used

Module 6 34042023

Inference Rules for Functional Dependencies

F is the set of functional dependencies that are specified on relation schema R

Schema designers specifies the most obvious FDs

The other dependencies can be inferred or deduced from FDs in F

Module 6 35042023

Example of Closure Department has one manager (DEPT_NO -gt

MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two

dependencies together imply that (DEPT_NO-gtMGR_PHONE)

This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F

The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)

Module 6 36042023

Example

F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are

SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME

To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y

Module 6 37042023

Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold

whenever the FDs in F hold Armstrongs inference rules

IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ

(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z

IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer

from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]

By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F

Module 6 38042023

Inference Rules for FDs Some additional inference rules that are useful

Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z

The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 10: Module 6

Module 6 10042023

Two relation schemas suffering from update anomalies

ENAME SSN BDATEADDRES

SDNUMBE

RDNAME

DMGRSSN

PLOCATION

SSNPNUMBE

RHOURS ENAME PNAME

EMP_DEPT

EMP_PROJ

Module 6 11042023

EXAMPLE OF AN INSERT ANOMALY Consider the relation

EMP_PROJ(Emp Proj Ename Pname No_hours)

Insert Anomaly Cannot insert a project unless an employee is

assigned to it Conversely

Cannot insert an employee unless an heshe is assigned to a project

Module 6 12042023

EXAMPLE OF AN DELETE ANOMALY Consider the relation

EMP_PROJ(Emp Proj Ename Pname No_hours)

Delete Anomaly When a project is deleted it will result in deleting

all the employees who work on that project Alternately if an employee is the sole employee

on a project deleting that employee would result in deleting the corresponding project

Module 6 13042023

EXAMPLE OF AN UPDATE ANOMALY Consider the relation

EMP_PROJ(Emp Proj Ename Pname No_hours)

Update AnomalyChanging the name of project number P1

from ldquoBillingrdquo to ldquoCustomer-Accountingrdquo may cause this update to be made for all 100 employees working on project P1

Module 6 14042023

Module 6 15042023

Guideline to Redundant Information in Tuples and Update Anomalies GUIDELINE 2

Design a schema that does not suffer from the insertion deletion and update anomalies

If there are any anomalies present then note them so that applications can be made to take them into account

In general it is advisable to use anomaly free base relations and to specify views that include the joins for placing together the attributes frequently referenced in important queries

Module 6 16042023

Problems with Nulls If many attributes are grouped together

as a fat relation it gives rise to many nulls in the tuples

Waste storage Problems in understanding the

meaning of the attributes Difficult while using Nulls in aggregate

operators like count or sum

Module 6 17042023

3 Null Values in Tuples Interpretations of nulls

Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist but unavailable

GUIDELINE 3 Relations should be designed such that their

tuples will have as few NULL values as possible Attributes that are NULL frequently could be

placed in separate relations (with the primary key) Example-

if only 10 of employees have individual offices it is better not to include office_number as an attribute in the employee relation

Better create a new relation emp_offices(essn office_number)

Module 6 18042023

Example of Spurious Tuples

Module 6 19042023

Generation of spurious tuples The two relations EMP_PROJ1 and EMP_LOCS as

the base relations of EMP_PROJ is not a good schema design

Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ

These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid

This is because the PLOCATION attribute which is used for joining is neither a primary key nor a foreign key in either EMP_LOCS AND EMP_PROJ1

Module 6 20042023

Example of Spurious Tuples contd

Module 6 21042023

4 Spurious Tuples Bad designs for a relational database may result

in erroneous results for certain JOIN operations The lossless join property is used to

guarantee meaningful results for join operations

GUIDELINE 4 Design relation schemas so that they can be

joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated

Module 6 22042023

Spurious Tuples

There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies

Note that Property (a) is extremely important and cannot be

sacrificed Property (b) is less stringent and may be sacrificed

Module 6 23042023

Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done

during Insertion Modification Deletion

Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values

Generation of invalid and spurious data during joins on improperly related base relations

Module 6 24042023

Functional dependencies Functional dependencies (FDs)

Is a constraint between two sets of attributes from the database

Assumption The entire database is a single universal

relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes

Module 6 25042023

Definition

FDs are used to specify formal measures of the

goodness of relational designs keys that are used to define normal forms for

relations constraints that are derived from the meaning and

interrelationships of the data attributes A set of attributes X functionally determines

a set of attributes Y if the value of X determines a unique value for Y

Module 6 26042023

Functional Dependencies

A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If

t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r

depend on or are determined by the values of the X component

The values of the X component functionally determines the values of Y component

FDs are derived from the real-world constraints on the attributes

The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times

Module 6 27042023

Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760

Superior NA 31700 350

Victoria Africa 26828 250

Aral Sea Asia 24904 280

Huron NA 23000 206

Michigan NA 22300 307

Tanganyika Africa 12700 420

Continent -gtName Name -gtLength

Module 6 28042023

Graphical representation of Functional Dependencies

Module 6 29042023

Examples of FD constraints Social security number uniquely determines

employee name SSN -gt ENAME

Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION

Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS

Module 6 30042023

Examples of FD constraints A FD is a property of the attributes in the

schema R not of a particular legal relation state r of R

It must be defined explicitly by someone who knows the semantics of the attributes of R

The constraint must hold on every relation instance r(R)

If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with

t1[K]=t2[K])

Module 6 31042023

Satisfies algorithm

Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B

How it works Sort the tuples of the relation r on the A attributes so

that tuples with equal values under A are next to each other

Check that tuples with equal values under attributes A also have equal values under B

If it meets the condition 2 then the output of the algorithm is true else it is false

Module 6 32042023

Relation state of TEACH

TEACH

TEACHER COURSE TEXT

Teacher Course Text

Smith Data Structures

Bartram

Smith Data Management

Martin

Hall Compilers Hoffmann

Brown ooad Horowitz

TEACHER -gt COURSE

TEXT -gt COURSE

Module 6 33042023

Drawbacks of Satifies algorithm

Using this algorithm is tedious and time consuming

So inference axioms are used

Module 6 34042023

Inference Rules for Functional Dependencies

F is the set of functional dependencies that are specified on relation schema R

Schema designers specifies the most obvious FDs

The other dependencies can be inferred or deduced from FDs in F

Module 6 35042023

Example of Closure Department has one manager (DEPT_NO -gt

MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two

dependencies together imply that (DEPT_NO-gtMGR_PHONE)

This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F

The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)

Module 6 36042023

Example

F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are

SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME

To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y

Module 6 37042023

Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold

whenever the FDs in F hold Armstrongs inference rules

IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ

(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z

IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer

from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]

By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F

Module 6 38042023

Inference Rules for FDs Some additional inference rules that are useful

Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z

The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 11: Module 6

Module 6 11042023

EXAMPLE OF AN INSERT ANOMALY Consider the relation

EMP_PROJ(Emp Proj Ename Pname No_hours)

Insert Anomaly Cannot insert a project unless an employee is

assigned to it Conversely

Cannot insert an employee unless an heshe is assigned to a project

Module 6 12042023

EXAMPLE OF AN DELETE ANOMALY Consider the relation

EMP_PROJ(Emp Proj Ename Pname No_hours)

Delete Anomaly When a project is deleted it will result in deleting

all the employees who work on that project Alternately if an employee is the sole employee

on a project deleting that employee would result in deleting the corresponding project

Module 6 13042023

EXAMPLE OF AN UPDATE ANOMALY Consider the relation

EMP_PROJ(Emp Proj Ename Pname No_hours)

Update AnomalyChanging the name of project number P1

from ldquoBillingrdquo to ldquoCustomer-Accountingrdquo may cause this update to be made for all 100 employees working on project P1

Module 6 14042023

Module 6 15042023

Guideline to Redundant Information in Tuples and Update Anomalies GUIDELINE 2

Design a schema that does not suffer from the insertion deletion and update anomalies

If there are any anomalies present then note them so that applications can be made to take them into account

In general it is advisable to use anomaly free base relations and to specify views that include the joins for placing together the attributes frequently referenced in important queries

Module 6 16042023

Problems with Nulls If many attributes are grouped together

as a fat relation it gives rise to many nulls in the tuples

Waste storage Problems in understanding the

meaning of the attributes Difficult while using Nulls in aggregate

operators like count or sum

Module 6 17042023

3 Null Values in Tuples Interpretations of nulls

Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist but unavailable

GUIDELINE 3 Relations should be designed such that their

tuples will have as few NULL values as possible Attributes that are NULL frequently could be

placed in separate relations (with the primary key) Example-

if only 10 of employees have individual offices it is better not to include office_number as an attribute in the employee relation

Better create a new relation emp_offices(essn office_number)

Module 6 18042023

Example of Spurious Tuples

Module 6 19042023

Generation of spurious tuples The two relations EMP_PROJ1 and EMP_LOCS as

the base relations of EMP_PROJ is not a good schema design

Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ

These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid

This is because the PLOCATION attribute which is used for joining is neither a primary key nor a foreign key in either EMP_LOCS AND EMP_PROJ1

Module 6 20042023

Example of Spurious Tuples contd

Module 6 21042023

4 Spurious Tuples Bad designs for a relational database may result

in erroneous results for certain JOIN operations The lossless join property is used to

guarantee meaningful results for join operations

GUIDELINE 4 Design relation schemas so that they can be

joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated

Module 6 22042023

Spurious Tuples

There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies

Note that Property (a) is extremely important and cannot be

sacrificed Property (b) is less stringent and may be sacrificed

Module 6 23042023

Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done

during Insertion Modification Deletion

Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values

Generation of invalid and spurious data during joins on improperly related base relations

Module 6 24042023

Functional dependencies Functional dependencies (FDs)

Is a constraint between two sets of attributes from the database

Assumption The entire database is a single universal

relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes

Module 6 25042023

Definition

FDs are used to specify formal measures of the

goodness of relational designs keys that are used to define normal forms for

relations constraints that are derived from the meaning and

interrelationships of the data attributes A set of attributes X functionally determines

a set of attributes Y if the value of X determines a unique value for Y

Module 6 26042023

Functional Dependencies

A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If

t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r

depend on or are determined by the values of the X component

The values of the X component functionally determines the values of Y component

FDs are derived from the real-world constraints on the attributes

The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times

Module 6 27042023

Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760

Superior NA 31700 350

Victoria Africa 26828 250

Aral Sea Asia 24904 280

Huron NA 23000 206

Michigan NA 22300 307

Tanganyika Africa 12700 420

Continent -gtName Name -gtLength

Module 6 28042023

Graphical representation of Functional Dependencies

Module 6 29042023

Examples of FD constraints Social security number uniquely determines

employee name SSN -gt ENAME

Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION

Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS

Module 6 30042023

Examples of FD constraints A FD is a property of the attributes in the

schema R not of a particular legal relation state r of R

It must be defined explicitly by someone who knows the semantics of the attributes of R

The constraint must hold on every relation instance r(R)

If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with

t1[K]=t2[K])

Module 6 31042023

Satisfies algorithm

Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B

How it works Sort the tuples of the relation r on the A attributes so

that tuples with equal values under A are next to each other

Check that tuples with equal values under attributes A also have equal values under B

If it meets the condition 2 then the output of the algorithm is true else it is false

Module 6 32042023

Relation state of TEACH

TEACH

TEACHER COURSE TEXT

Teacher Course Text

Smith Data Structures

Bartram

Smith Data Management

Martin

Hall Compilers Hoffmann

Brown ooad Horowitz

TEACHER -gt COURSE

TEXT -gt COURSE

Module 6 33042023

Drawbacks of Satifies algorithm

Using this algorithm is tedious and time consuming

So inference axioms are used

Module 6 34042023

Inference Rules for Functional Dependencies

F is the set of functional dependencies that are specified on relation schema R

Schema designers specifies the most obvious FDs

The other dependencies can be inferred or deduced from FDs in F

Module 6 35042023

Example of Closure Department has one manager (DEPT_NO -gt

MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two

dependencies together imply that (DEPT_NO-gtMGR_PHONE)

This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F

The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)

Module 6 36042023

Example

F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are

SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME

To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y

Module 6 37042023

Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold

whenever the FDs in F hold Armstrongs inference rules

IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ

(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z

IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer

from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]

By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F

Module 6 38042023

Inference Rules for FDs Some additional inference rules that are useful

Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z

The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 12: Module 6

Module 6 12042023

EXAMPLE OF AN DELETE ANOMALY Consider the relation

EMP_PROJ(Emp Proj Ename Pname No_hours)

Delete Anomaly When a project is deleted it will result in deleting

all the employees who work on that project Alternately if an employee is the sole employee

on a project deleting that employee would result in deleting the corresponding project

Module 6 13042023

EXAMPLE OF AN UPDATE ANOMALY Consider the relation

EMP_PROJ(Emp Proj Ename Pname No_hours)

Update AnomalyChanging the name of project number P1

from ldquoBillingrdquo to ldquoCustomer-Accountingrdquo may cause this update to be made for all 100 employees working on project P1

Module 6 14042023

Module 6 15042023

Guideline to Redundant Information in Tuples and Update Anomalies GUIDELINE 2

Design a schema that does not suffer from the insertion deletion and update anomalies

If there are any anomalies present then note them so that applications can be made to take them into account

In general it is advisable to use anomaly free base relations and to specify views that include the joins for placing together the attributes frequently referenced in important queries

Module 6 16042023

Problems with Nulls If many attributes are grouped together

as a fat relation it gives rise to many nulls in the tuples

Waste storage Problems in understanding the

meaning of the attributes Difficult while using Nulls in aggregate

operators like count or sum

Module 6 17042023

3 Null Values in Tuples Interpretations of nulls

Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist but unavailable

GUIDELINE 3 Relations should be designed such that their

tuples will have as few NULL values as possible Attributes that are NULL frequently could be

placed in separate relations (with the primary key) Example-

if only 10 of employees have individual offices it is better not to include office_number as an attribute in the employee relation

Better create a new relation emp_offices(essn office_number)

Module 6 18042023

Example of Spurious Tuples

Module 6 19042023

Generation of spurious tuples The two relations EMP_PROJ1 and EMP_LOCS as

the base relations of EMP_PROJ is not a good schema design

Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ

These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid

This is because the PLOCATION attribute which is used for joining is neither a primary key nor a foreign key in either EMP_LOCS AND EMP_PROJ1

Module 6 20042023

Example of Spurious Tuples contd

Module 6 21042023

4 Spurious Tuples Bad designs for a relational database may result

in erroneous results for certain JOIN operations The lossless join property is used to

guarantee meaningful results for join operations

GUIDELINE 4 Design relation schemas so that they can be

joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated

Module 6 22042023

Spurious Tuples

There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies

Note that Property (a) is extremely important and cannot be

sacrificed Property (b) is less stringent and may be sacrificed

Module 6 23042023

Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done

during Insertion Modification Deletion

Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values

Generation of invalid and spurious data during joins on improperly related base relations

Module 6 24042023

Functional dependencies Functional dependencies (FDs)

Is a constraint between two sets of attributes from the database

Assumption The entire database is a single universal

relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes

Module 6 25042023

Definition

FDs are used to specify formal measures of the

goodness of relational designs keys that are used to define normal forms for

relations constraints that are derived from the meaning and

interrelationships of the data attributes A set of attributes X functionally determines

a set of attributes Y if the value of X determines a unique value for Y

Module 6 26042023

Functional Dependencies

A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If

t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r

depend on or are determined by the values of the X component

The values of the X component functionally determines the values of Y component

FDs are derived from the real-world constraints on the attributes

The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times

Module 6 27042023

Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760

Superior NA 31700 350

Victoria Africa 26828 250

Aral Sea Asia 24904 280

Huron NA 23000 206

Michigan NA 22300 307

Tanganyika Africa 12700 420

Continent -gtName Name -gtLength

Module 6 28042023

Graphical representation of Functional Dependencies

Module 6 29042023

Examples of FD constraints Social security number uniquely determines

employee name SSN -gt ENAME

Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION

Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS

Module 6 30042023

Examples of FD constraints A FD is a property of the attributes in the

schema R not of a particular legal relation state r of R

It must be defined explicitly by someone who knows the semantics of the attributes of R

The constraint must hold on every relation instance r(R)

If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with

t1[K]=t2[K])

Module 6 31042023

Satisfies algorithm

Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B

How it works Sort the tuples of the relation r on the A attributes so

that tuples with equal values under A are next to each other

Check that tuples with equal values under attributes A also have equal values under B

If it meets the condition 2 then the output of the algorithm is true else it is false

Module 6 32042023

Relation state of TEACH

TEACH

TEACHER COURSE TEXT

Teacher Course Text

Smith Data Structures

Bartram

Smith Data Management

Martin

Hall Compilers Hoffmann

Brown ooad Horowitz

TEACHER -gt COURSE

TEXT -gt COURSE

Module 6 33042023

Drawbacks of Satifies algorithm

Using this algorithm is tedious and time consuming

So inference axioms are used

Module 6 34042023

Inference Rules for Functional Dependencies

F is the set of functional dependencies that are specified on relation schema R

Schema designers specifies the most obvious FDs

The other dependencies can be inferred or deduced from FDs in F

Module 6 35042023

Example of Closure Department has one manager (DEPT_NO -gt

MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two

dependencies together imply that (DEPT_NO-gtMGR_PHONE)

This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F

The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)

Module 6 36042023

Example

F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are

SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME

To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y

Module 6 37042023

Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold

whenever the FDs in F hold Armstrongs inference rules

IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ

(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z

IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer

from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]

By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F

Module 6 38042023

Inference Rules for FDs Some additional inference rules that are useful

Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z

The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 13: Module 6

Module 6 13042023

EXAMPLE OF AN UPDATE ANOMALY Consider the relation

EMP_PROJ(Emp Proj Ename Pname No_hours)

Update AnomalyChanging the name of project number P1

from ldquoBillingrdquo to ldquoCustomer-Accountingrdquo may cause this update to be made for all 100 employees working on project P1

Module 6 14042023

Module 6 15042023

Guideline to Redundant Information in Tuples and Update Anomalies GUIDELINE 2

Design a schema that does not suffer from the insertion deletion and update anomalies

If there are any anomalies present then note them so that applications can be made to take them into account

In general it is advisable to use anomaly free base relations and to specify views that include the joins for placing together the attributes frequently referenced in important queries

Module 6 16042023

Problems with Nulls If many attributes are grouped together

as a fat relation it gives rise to many nulls in the tuples

Waste storage Problems in understanding the

meaning of the attributes Difficult while using Nulls in aggregate

operators like count or sum

Module 6 17042023

3 Null Values in Tuples Interpretations of nulls

Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist but unavailable

GUIDELINE 3 Relations should be designed such that their

tuples will have as few NULL values as possible Attributes that are NULL frequently could be

placed in separate relations (with the primary key) Example-

if only 10 of employees have individual offices it is better not to include office_number as an attribute in the employee relation

Better create a new relation emp_offices(essn office_number)

Module 6 18042023

Example of Spurious Tuples

Module 6 19042023

Generation of spurious tuples The two relations EMP_PROJ1 and EMP_LOCS as

the base relations of EMP_PROJ is not a good schema design

Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ

These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid

This is because the PLOCATION attribute which is used for joining is neither a primary key nor a foreign key in either EMP_LOCS AND EMP_PROJ1

Module 6 20042023

Example of Spurious Tuples contd

Module 6 21042023

4 Spurious Tuples Bad designs for a relational database may result

in erroneous results for certain JOIN operations The lossless join property is used to

guarantee meaningful results for join operations

GUIDELINE 4 Design relation schemas so that they can be

joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated

Module 6 22042023

Spurious Tuples

There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies

Note that Property (a) is extremely important and cannot be

sacrificed Property (b) is less stringent and may be sacrificed

Module 6 23042023

Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done

during Insertion Modification Deletion

Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values

Generation of invalid and spurious data during joins on improperly related base relations

Module 6 24042023

Functional dependencies Functional dependencies (FDs)

Is a constraint between two sets of attributes from the database

Assumption The entire database is a single universal

relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes

Module 6 25042023

Definition

FDs are used to specify formal measures of the

goodness of relational designs keys that are used to define normal forms for

relations constraints that are derived from the meaning and

interrelationships of the data attributes A set of attributes X functionally determines

a set of attributes Y if the value of X determines a unique value for Y

Module 6 26042023

Functional Dependencies

A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If

t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r

depend on or are determined by the values of the X component

The values of the X component functionally determines the values of Y component

FDs are derived from the real-world constraints on the attributes

The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times

Module 6 27042023

Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760

Superior NA 31700 350

Victoria Africa 26828 250

Aral Sea Asia 24904 280

Huron NA 23000 206

Michigan NA 22300 307

Tanganyika Africa 12700 420

Continent -gtName Name -gtLength

Module 6 28042023

Graphical representation of Functional Dependencies

Module 6 29042023

Examples of FD constraints Social security number uniquely determines

employee name SSN -gt ENAME

Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION

Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS

Module 6 30042023

Examples of FD constraints A FD is a property of the attributes in the

schema R not of a particular legal relation state r of R

It must be defined explicitly by someone who knows the semantics of the attributes of R

The constraint must hold on every relation instance r(R)

If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with

t1[K]=t2[K])

Module 6 31042023

Satisfies algorithm

Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B

How it works Sort the tuples of the relation r on the A attributes so

that tuples with equal values under A are next to each other

Check that tuples with equal values under attributes A also have equal values under B

If it meets the condition 2 then the output of the algorithm is true else it is false

Module 6 32042023

Relation state of TEACH

TEACH

TEACHER COURSE TEXT

Teacher Course Text

Smith Data Structures

Bartram

Smith Data Management

Martin

Hall Compilers Hoffmann

Brown ooad Horowitz

TEACHER -gt COURSE

TEXT -gt COURSE

Module 6 33042023

Drawbacks of Satifies algorithm

Using this algorithm is tedious and time consuming

So inference axioms are used

Module 6 34042023

Inference Rules for Functional Dependencies

F is the set of functional dependencies that are specified on relation schema R

Schema designers specifies the most obvious FDs

The other dependencies can be inferred or deduced from FDs in F

Module 6 35042023

Example of Closure Department has one manager (DEPT_NO -gt

MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two

dependencies together imply that (DEPT_NO-gtMGR_PHONE)

This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F

The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)

Module 6 36042023

Example

F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are

SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME

To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y

Module 6 37042023

Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold

whenever the FDs in F hold Armstrongs inference rules

IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ

(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z

IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer

from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]

By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F

Module 6 38042023

Inference Rules for FDs Some additional inference rules that are useful

Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z

The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 14: Module 6

Module 6 14042023

Module 6 15042023

Guideline to Redundant Information in Tuples and Update Anomalies GUIDELINE 2

Design a schema that does not suffer from the insertion deletion and update anomalies

If there are any anomalies present then note them so that applications can be made to take them into account

In general it is advisable to use anomaly free base relations and to specify views that include the joins for placing together the attributes frequently referenced in important queries

Module 6 16042023

Problems with Nulls If many attributes are grouped together

as a fat relation it gives rise to many nulls in the tuples

Waste storage Problems in understanding the

meaning of the attributes Difficult while using Nulls in aggregate

operators like count or sum

Module 6 17042023

3 Null Values in Tuples Interpretations of nulls

Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist but unavailable

GUIDELINE 3 Relations should be designed such that their

tuples will have as few NULL values as possible Attributes that are NULL frequently could be

placed in separate relations (with the primary key) Example-

if only 10 of employees have individual offices it is better not to include office_number as an attribute in the employee relation

Better create a new relation emp_offices(essn office_number)

Module 6 18042023

Example of Spurious Tuples

Module 6 19042023

Generation of spurious tuples The two relations EMP_PROJ1 and EMP_LOCS as

the base relations of EMP_PROJ is not a good schema design

Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ

These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid

This is because the PLOCATION attribute which is used for joining is neither a primary key nor a foreign key in either EMP_LOCS AND EMP_PROJ1

Module 6 20042023

Example of Spurious Tuples contd

Module 6 21042023

4 Spurious Tuples Bad designs for a relational database may result

in erroneous results for certain JOIN operations The lossless join property is used to

guarantee meaningful results for join operations

GUIDELINE 4 Design relation schemas so that they can be

joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated

Module 6 22042023

Spurious Tuples

There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies

Note that Property (a) is extremely important and cannot be

sacrificed Property (b) is less stringent and may be sacrificed

Module 6 23042023

Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done

during Insertion Modification Deletion

Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values

Generation of invalid and spurious data during joins on improperly related base relations

Module 6 24042023

Functional dependencies Functional dependencies (FDs)

Is a constraint between two sets of attributes from the database

Assumption The entire database is a single universal

relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes

Module 6 25042023

Definition

FDs are used to specify formal measures of the

goodness of relational designs keys that are used to define normal forms for

relations constraints that are derived from the meaning and

interrelationships of the data attributes A set of attributes X functionally determines

a set of attributes Y if the value of X determines a unique value for Y

Module 6 26042023

Functional Dependencies

A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If

t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r

depend on or are determined by the values of the X component

The values of the X component functionally determines the values of Y component

FDs are derived from the real-world constraints on the attributes

The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times

Module 6 27042023

Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760

Superior NA 31700 350

Victoria Africa 26828 250

Aral Sea Asia 24904 280

Huron NA 23000 206

Michigan NA 22300 307

Tanganyika Africa 12700 420

Continent -gtName Name -gtLength

Module 6 28042023

Graphical representation of Functional Dependencies

Module 6 29042023

Examples of FD constraints Social security number uniquely determines

employee name SSN -gt ENAME

Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION

Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS

Module 6 30042023

Examples of FD constraints A FD is a property of the attributes in the

schema R not of a particular legal relation state r of R

It must be defined explicitly by someone who knows the semantics of the attributes of R

The constraint must hold on every relation instance r(R)

If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with

t1[K]=t2[K])

Module 6 31042023

Satisfies algorithm

Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B

How it works Sort the tuples of the relation r on the A attributes so

that tuples with equal values under A are next to each other

Check that tuples with equal values under attributes A also have equal values under B

If it meets the condition 2 then the output of the algorithm is true else it is false

Module 6 32042023

Relation state of TEACH

TEACH

TEACHER COURSE TEXT

Teacher Course Text

Smith Data Structures

Bartram

Smith Data Management

Martin

Hall Compilers Hoffmann

Brown ooad Horowitz

TEACHER -gt COURSE

TEXT -gt COURSE

Module 6 33042023

Drawbacks of Satifies algorithm

Using this algorithm is tedious and time consuming

So inference axioms are used

Module 6 34042023

Inference Rules for Functional Dependencies

F is the set of functional dependencies that are specified on relation schema R

Schema designers specifies the most obvious FDs

The other dependencies can be inferred or deduced from FDs in F

Module 6 35042023

Example of Closure Department has one manager (DEPT_NO -gt

MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two

dependencies together imply that (DEPT_NO-gtMGR_PHONE)

This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F

The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)

Module 6 36042023

Example

F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are

SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME

To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y

Module 6 37042023

Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold

whenever the FDs in F hold Armstrongs inference rules

IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ

(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z

IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer

from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]

By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F

Module 6 38042023

Inference Rules for FDs Some additional inference rules that are useful

Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z

The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 15: Module 6

Module 6 15042023

Guideline to Redundant Information in Tuples and Update Anomalies GUIDELINE 2

Design a schema that does not suffer from the insertion deletion and update anomalies

If there are any anomalies present then note them so that applications can be made to take them into account

In general it is advisable to use anomaly free base relations and to specify views that include the joins for placing together the attributes frequently referenced in important queries

Module 6 16042023

Problems with Nulls If many attributes are grouped together

as a fat relation it gives rise to many nulls in the tuples

Waste storage Problems in understanding the

meaning of the attributes Difficult while using Nulls in aggregate

operators like count or sum

Module 6 17042023

3 Null Values in Tuples Interpretations of nulls

Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist but unavailable

GUIDELINE 3 Relations should be designed such that their

tuples will have as few NULL values as possible Attributes that are NULL frequently could be

placed in separate relations (with the primary key) Example-

if only 10 of employees have individual offices it is better not to include office_number as an attribute in the employee relation

Better create a new relation emp_offices(essn office_number)

Module 6 18042023

Example of Spurious Tuples

Module 6 19042023

Generation of spurious tuples The two relations EMP_PROJ1 and EMP_LOCS as

the base relations of EMP_PROJ is not a good schema design

Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ

These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid

This is because the PLOCATION attribute which is used for joining is neither a primary key nor a foreign key in either EMP_LOCS AND EMP_PROJ1

Module 6 20042023

Example of Spurious Tuples contd

Module 6 21042023

4 Spurious Tuples Bad designs for a relational database may result

in erroneous results for certain JOIN operations The lossless join property is used to

guarantee meaningful results for join operations

GUIDELINE 4 Design relation schemas so that they can be

joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated

Module 6 22042023

Spurious Tuples

There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies

Note that Property (a) is extremely important and cannot be

sacrificed Property (b) is less stringent and may be sacrificed

Module 6 23042023

Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done

during Insertion Modification Deletion

Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values

Generation of invalid and spurious data during joins on improperly related base relations

Module 6 24042023

Functional dependencies Functional dependencies (FDs)

Is a constraint between two sets of attributes from the database

Assumption The entire database is a single universal

relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes

Module 6 25042023

Definition

FDs are used to specify formal measures of the

goodness of relational designs keys that are used to define normal forms for

relations constraints that are derived from the meaning and

interrelationships of the data attributes A set of attributes X functionally determines

a set of attributes Y if the value of X determines a unique value for Y

Module 6 26042023

Functional Dependencies

A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If

t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r

depend on or are determined by the values of the X component

The values of the X component functionally determines the values of Y component

FDs are derived from the real-world constraints on the attributes

The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times

Module 6 27042023

Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760

Superior NA 31700 350

Victoria Africa 26828 250

Aral Sea Asia 24904 280

Huron NA 23000 206

Michigan NA 22300 307

Tanganyika Africa 12700 420

Continent -gtName Name -gtLength

Module 6 28042023

Graphical representation of Functional Dependencies

Module 6 29042023

Examples of FD constraints Social security number uniquely determines

employee name SSN -gt ENAME

Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION

Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS

Module 6 30042023

Examples of FD constraints A FD is a property of the attributes in the

schema R not of a particular legal relation state r of R

It must be defined explicitly by someone who knows the semantics of the attributes of R

The constraint must hold on every relation instance r(R)

If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with

t1[K]=t2[K])

Module 6 31042023

Satisfies algorithm

Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B

How it works Sort the tuples of the relation r on the A attributes so

that tuples with equal values under A are next to each other

Check that tuples with equal values under attributes A also have equal values under B

If it meets the condition 2 then the output of the algorithm is true else it is false

Module 6 32042023

Relation state of TEACH

TEACH

TEACHER COURSE TEXT

Teacher Course Text

Smith Data Structures

Bartram

Smith Data Management

Martin

Hall Compilers Hoffmann

Brown ooad Horowitz

TEACHER -gt COURSE

TEXT -gt COURSE

Module 6 33042023

Drawbacks of Satifies algorithm

Using this algorithm is tedious and time consuming

So inference axioms are used

Module 6 34042023

Inference Rules for Functional Dependencies

F is the set of functional dependencies that are specified on relation schema R

Schema designers specifies the most obvious FDs

The other dependencies can be inferred or deduced from FDs in F

Module 6 35042023

Example of Closure Department has one manager (DEPT_NO -gt

MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two

dependencies together imply that (DEPT_NO-gtMGR_PHONE)

This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F

The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)

Module 6 36042023

Example

F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are

SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME

To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y

Module 6 37042023

Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold

whenever the FDs in F hold Armstrongs inference rules

IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ

(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z

IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer

from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]

By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F

Module 6 38042023

Inference Rules for FDs Some additional inference rules that are useful

Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z

The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 16: Module 6

Module 6 16042023

Problems with Nulls If many attributes are grouped together

as a fat relation it gives rise to many nulls in the tuples

Waste storage Problems in understanding the

meaning of the attributes Difficult while using Nulls in aggregate

operators like count or sum

Module 6 17042023

3 Null Values in Tuples Interpretations of nulls

Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist but unavailable

GUIDELINE 3 Relations should be designed such that their

tuples will have as few NULL values as possible Attributes that are NULL frequently could be

placed in separate relations (with the primary key) Example-

if only 10 of employees have individual offices it is better not to include office_number as an attribute in the employee relation

Better create a new relation emp_offices(essn office_number)

Module 6 18042023

Example of Spurious Tuples

Module 6 19042023

Generation of spurious tuples The two relations EMP_PROJ1 and EMP_LOCS as

the base relations of EMP_PROJ is not a good schema design

Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ

These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid

This is because the PLOCATION attribute which is used for joining is neither a primary key nor a foreign key in either EMP_LOCS AND EMP_PROJ1

Module 6 20042023

Example of Spurious Tuples contd

Module 6 21042023

4 Spurious Tuples Bad designs for a relational database may result

in erroneous results for certain JOIN operations The lossless join property is used to

guarantee meaningful results for join operations

GUIDELINE 4 Design relation schemas so that they can be

joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated

Module 6 22042023

Spurious Tuples

There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies

Note that Property (a) is extremely important and cannot be

sacrificed Property (b) is less stringent and may be sacrificed

Module 6 23042023

Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done

during Insertion Modification Deletion

Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values

Generation of invalid and spurious data during joins on improperly related base relations

Module 6 24042023

Functional dependencies Functional dependencies (FDs)

Is a constraint between two sets of attributes from the database

Assumption The entire database is a single universal

relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes

Module 6 25042023

Definition

FDs are used to specify formal measures of the

goodness of relational designs keys that are used to define normal forms for

relations constraints that are derived from the meaning and

interrelationships of the data attributes A set of attributes X functionally determines

a set of attributes Y if the value of X determines a unique value for Y

Module 6 26042023

Functional Dependencies

A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If

t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r

depend on or are determined by the values of the X component

The values of the X component functionally determines the values of Y component

FDs are derived from the real-world constraints on the attributes

The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times

Module 6 27042023

Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760

Superior NA 31700 350

Victoria Africa 26828 250

Aral Sea Asia 24904 280

Huron NA 23000 206

Michigan NA 22300 307

Tanganyika Africa 12700 420

Continent -gtName Name -gtLength

Module 6 28042023

Graphical representation of Functional Dependencies

Module 6 29042023

Examples of FD constraints Social security number uniquely determines

employee name SSN -gt ENAME

Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION

Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS

Module 6 30042023

Examples of FD constraints A FD is a property of the attributes in the

schema R not of a particular legal relation state r of R

It must be defined explicitly by someone who knows the semantics of the attributes of R

The constraint must hold on every relation instance r(R)

If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with

t1[K]=t2[K])

Module 6 31042023

Satisfies algorithm

Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B

How it works Sort the tuples of the relation r on the A attributes so

that tuples with equal values under A are next to each other

Check that tuples with equal values under attributes A also have equal values under B

If it meets the condition 2 then the output of the algorithm is true else it is false

Module 6 32042023

Relation state of TEACH

TEACH

TEACHER COURSE TEXT

Teacher Course Text

Smith Data Structures

Bartram

Smith Data Management

Martin

Hall Compilers Hoffmann

Brown ooad Horowitz

TEACHER -gt COURSE

TEXT -gt COURSE

Module 6 33042023

Drawbacks of Satifies algorithm

Using this algorithm is tedious and time consuming

So inference axioms are used

Module 6 34042023

Inference Rules for Functional Dependencies

F is the set of functional dependencies that are specified on relation schema R

Schema designers specifies the most obvious FDs

The other dependencies can be inferred or deduced from FDs in F

Module 6 35042023

Example of Closure Department has one manager (DEPT_NO -gt

MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two

dependencies together imply that (DEPT_NO-gtMGR_PHONE)

This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F

The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)

Module 6 36042023

Example

F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are

SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME

To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y

Module 6 37042023

Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold

whenever the FDs in F hold Armstrongs inference rules

IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ

(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z

IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer

from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]

By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F

Module 6 38042023

Inference Rules for FDs Some additional inference rules that are useful

Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z

The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 17: Module 6

Module 6 17042023

3 Null Values in Tuples Interpretations of nulls

Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist but unavailable

GUIDELINE 3 Relations should be designed such that their

tuples will have as few NULL values as possible Attributes that are NULL frequently could be

placed in separate relations (with the primary key) Example-

if only 10 of employees have individual offices it is better not to include office_number as an attribute in the employee relation

Better create a new relation emp_offices(essn office_number)

Module 6 18042023

Example of Spurious Tuples

Module 6 19042023

Generation of spurious tuples The two relations EMP_PROJ1 and EMP_LOCS as

the base relations of EMP_PROJ is not a good schema design

Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ

These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid

This is because the PLOCATION attribute which is used for joining is neither a primary key nor a foreign key in either EMP_LOCS AND EMP_PROJ1

Module 6 20042023

Example of Spurious Tuples contd

Module 6 21042023

4 Spurious Tuples Bad designs for a relational database may result

in erroneous results for certain JOIN operations The lossless join property is used to

guarantee meaningful results for join operations

GUIDELINE 4 Design relation schemas so that they can be

joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated

Module 6 22042023

Spurious Tuples

There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies

Note that Property (a) is extremely important and cannot be

sacrificed Property (b) is less stringent and may be sacrificed

Module 6 23042023

Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done

during Insertion Modification Deletion

Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values

Generation of invalid and spurious data during joins on improperly related base relations

Module 6 24042023

Functional dependencies Functional dependencies (FDs)

Is a constraint between two sets of attributes from the database

Assumption The entire database is a single universal

relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes

Module 6 25042023

Definition

FDs are used to specify formal measures of the

goodness of relational designs keys that are used to define normal forms for

relations constraints that are derived from the meaning and

interrelationships of the data attributes A set of attributes X functionally determines

a set of attributes Y if the value of X determines a unique value for Y

Module 6 26042023

Functional Dependencies

A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If

t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r

depend on or are determined by the values of the X component

The values of the X component functionally determines the values of Y component

FDs are derived from the real-world constraints on the attributes

The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times

Module 6 27042023

Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760

Superior NA 31700 350

Victoria Africa 26828 250

Aral Sea Asia 24904 280

Huron NA 23000 206

Michigan NA 22300 307

Tanganyika Africa 12700 420

Continent -gtName Name -gtLength

Module 6 28042023

Graphical representation of Functional Dependencies

Module 6 29042023

Examples of FD constraints Social security number uniquely determines

employee name SSN -gt ENAME

Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION

Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS

Module 6 30042023

Examples of FD constraints A FD is a property of the attributes in the

schema R not of a particular legal relation state r of R

It must be defined explicitly by someone who knows the semantics of the attributes of R

The constraint must hold on every relation instance r(R)

If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with

t1[K]=t2[K])

Module 6 31042023

Satisfies algorithm

Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B

How it works Sort the tuples of the relation r on the A attributes so

that tuples with equal values under A are next to each other

Check that tuples with equal values under attributes A also have equal values under B

If it meets the condition 2 then the output of the algorithm is true else it is false

Module 6 32042023

Relation state of TEACH

TEACH

TEACHER COURSE TEXT

Teacher Course Text

Smith Data Structures

Bartram

Smith Data Management

Martin

Hall Compilers Hoffmann

Brown ooad Horowitz

TEACHER -gt COURSE

TEXT -gt COURSE

Module 6 33042023

Drawbacks of Satifies algorithm

Using this algorithm is tedious and time consuming

So inference axioms are used

Module 6 34042023

Inference Rules for Functional Dependencies

F is the set of functional dependencies that are specified on relation schema R

Schema designers specifies the most obvious FDs

The other dependencies can be inferred or deduced from FDs in F

Module 6 35042023

Example of Closure Department has one manager (DEPT_NO -gt

MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two

dependencies together imply that (DEPT_NO-gtMGR_PHONE)

This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F

The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)

Module 6 36042023

Example

F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are

SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME

To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y

Module 6 37042023

Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold

whenever the FDs in F hold Armstrongs inference rules

IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ

(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z

IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer

from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]

By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F

Module 6 38042023

Inference Rules for FDs Some additional inference rules that are useful

Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z

The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 18: Module 6

Module 6 18042023

Example of Spurious Tuples

Module 6 19042023

Generation of spurious tuples The two relations EMP_PROJ1 and EMP_LOCS as

the base relations of EMP_PROJ is not a good schema design

Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ

These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid

This is because the PLOCATION attribute which is used for joining is neither a primary key nor a foreign key in either EMP_LOCS AND EMP_PROJ1

Module 6 20042023

Example of Spurious Tuples contd

Module 6 21042023

4 Spurious Tuples Bad designs for a relational database may result

in erroneous results for certain JOIN operations The lossless join property is used to

guarantee meaningful results for join operations

GUIDELINE 4 Design relation schemas so that they can be

joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated

Module 6 22042023

Spurious Tuples

There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies

Note that Property (a) is extremely important and cannot be

sacrificed Property (b) is less stringent and may be sacrificed

Module 6 23042023

Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done

during Insertion Modification Deletion

Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values

Generation of invalid and spurious data during joins on improperly related base relations

Module 6 24042023

Functional dependencies Functional dependencies (FDs)

Is a constraint between two sets of attributes from the database

Assumption The entire database is a single universal

relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes

Module 6 25042023

Definition

FDs are used to specify formal measures of the

goodness of relational designs keys that are used to define normal forms for

relations constraints that are derived from the meaning and

interrelationships of the data attributes A set of attributes X functionally determines

a set of attributes Y if the value of X determines a unique value for Y

Module 6 26042023

Functional Dependencies

A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If

t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r

depend on or are determined by the values of the X component

The values of the X component functionally determines the values of Y component

FDs are derived from the real-world constraints on the attributes

The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times

Module 6 27042023

Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760

Superior NA 31700 350

Victoria Africa 26828 250

Aral Sea Asia 24904 280

Huron NA 23000 206

Michigan NA 22300 307

Tanganyika Africa 12700 420

Continent -gtName Name -gtLength

Module 6 28042023

Graphical representation of Functional Dependencies

Module 6 29042023

Examples of FD constraints Social security number uniquely determines

employee name SSN -gt ENAME

Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION

Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS

Module 6 30042023

Examples of FD constraints A FD is a property of the attributes in the

schema R not of a particular legal relation state r of R

It must be defined explicitly by someone who knows the semantics of the attributes of R

The constraint must hold on every relation instance r(R)

If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with

t1[K]=t2[K])

Module 6 31042023

Satisfies algorithm

Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B

How it works Sort the tuples of the relation r on the A attributes so

that tuples with equal values under A are next to each other

Check that tuples with equal values under attributes A also have equal values under B

If it meets the condition 2 then the output of the algorithm is true else it is false

Module 6 32042023

Relation state of TEACH

TEACH

TEACHER COURSE TEXT

Teacher Course Text

Smith Data Structures

Bartram

Smith Data Management

Martin

Hall Compilers Hoffmann

Brown ooad Horowitz

TEACHER -gt COURSE

TEXT -gt COURSE

Module 6 33042023

Drawbacks of Satifies algorithm

Using this algorithm is tedious and time consuming

So inference axioms are used

Module 6 34042023

Inference Rules for Functional Dependencies

F is the set of functional dependencies that are specified on relation schema R

Schema designers specifies the most obvious FDs

The other dependencies can be inferred or deduced from FDs in F

Module 6 35042023

Example of Closure Department has one manager (DEPT_NO -gt

MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two

dependencies together imply that (DEPT_NO-gtMGR_PHONE)

This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F

The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)

Module 6 36042023

Example

F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are

SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME

To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y

Module 6 37042023

Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold

whenever the FDs in F hold Armstrongs inference rules

IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ

(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z

IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer

from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]

By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F

Module 6 38042023

Inference Rules for FDs Some additional inference rules that are useful

Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z

The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 19: Module 6

Module 6 19042023

Generation of spurious tuples The two relations EMP_PROJ1 and EMP_LOCS as

the base relations of EMP_PROJ is not a good schema design

Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ

These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid

This is because the PLOCATION attribute which is used for joining is neither a primary key nor a foreign key in either EMP_LOCS AND EMP_PROJ1

Module 6 20042023

Example of Spurious Tuples contd

Module 6 21042023

4 Spurious Tuples Bad designs for a relational database may result

in erroneous results for certain JOIN operations The lossless join property is used to

guarantee meaningful results for join operations

GUIDELINE 4 Design relation schemas so that they can be

joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated

Module 6 22042023

Spurious Tuples

There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies

Note that Property (a) is extremely important and cannot be

sacrificed Property (b) is less stringent and may be sacrificed

Module 6 23042023

Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done

during Insertion Modification Deletion

Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values

Generation of invalid and spurious data during joins on improperly related base relations

Module 6 24042023

Functional dependencies Functional dependencies (FDs)

Is a constraint between two sets of attributes from the database

Assumption The entire database is a single universal

relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes

Module 6 25042023

Definition

FDs are used to specify formal measures of the

goodness of relational designs keys that are used to define normal forms for

relations constraints that are derived from the meaning and

interrelationships of the data attributes A set of attributes X functionally determines

a set of attributes Y if the value of X determines a unique value for Y

Module 6 26042023

Functional Dependencies

A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If

t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r

depend on or are determined by the values of the X component

The values of the X component functionally determines the values of Y component

FDs are derived from the real-world constraints on the attributes

The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times

Module 6 27042023

Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760

Superior NA 31700 350

Victoria Africa 26828 250

Aral Sea Asia 24904 280

Huron NA 23000 206

Michigan NA 22300 307

Tanganyika Africa 12700 420

Continent -gtName Name -gtLength

Module 6 28042023

Graphical representation of Functional Dependencies

Module 6 29042023

Examples of FD constraints Social security number uniquely determines

employee name SSN -gt ENAME

Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION

Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS

Module 6 30042023

Examples of FD constraints A FD is a property of the attributes in the

schema R not of a particular legal relation state r of R

It must be defined explicitly by someone who knows the semantics of the attributes of R

The constraint must hold on every relation instance r(R)

If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with

t1[K]=t2[K])

Module 6 31042023

Satisfies algorithm

Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B

How it works Sort the tuples of the relation r on the A attributes so

that tuples with equal values under A are next to each other

Check that tuples with equal values under attributes A also have equal values under B

If it meets the condition 2 then the output of the algorithm is true else it is false

Module 6 32042023

Relation state of TEACH

TEACH

TEACHER COURSE TEXT

Teacher Course Text

Smith Data Structures

Bartram

Smith Data Management

Martin

Hall Compilers Hoffmann

Brown ooad Horowitz

TEACHER -gt COURSE

TEXT -gt COURSE

Module 6 33042023

Drawbacks of Satifies algorithm

Using this algorithm is tedious and time consuming

So inference axioms are used

Module 6 34042023

Inference Rules for Functional Dependencies

F is the set of functional dependencies that are specified on relation schema R

Schema designers specifies the most obvious FDs

The other dependencies can be inferred or deduced from FDs in F

Module 6 35042023

Example of Closure Department has one manager (DEPT_NO -gt

MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two

dependencies together imply that (DEPT_NO-gtMGR_PHONE)

This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F

The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)

Module 6 36042023

Example

F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are

SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME

To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y

Module 6 37042023

Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold

whenever the FDs in F hold Armstrongs inference rules

IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ

(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z

IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer

from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]

By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F

Module 6 38042023

Inference Rules for FDs Some additional inference rules that are useful

Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z

The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 20: Module 6

Module 6 20042023

Example of Spurious Tuples contd

Module 6 21042023

4 Spurious Tuples Bad designs for a relational database may result

in erroneous results for certain JOIN operations The lossless join property is used to

guarantee meaningful results for join operations

GUIDELINE 4 Design relation schemas so that they can be

joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated

Module 6 22042023

Spurious Tuples

There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies

Note that Property (a) is extremely important and cannot be

sacrificed Property (b) is less stringent and may be sacrificed

Module 6 23042023

Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done

during Insertion Modification Deletion

Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values

Generation of invalid and spurious data during joins on improperly related base relations

Module 6 24042023

Functional dependencies Functional dependencies (FDs)

Is a constraint between two sets of attributes from the database

Assumption The entire database is a single universal

relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes

Module 6 25042023

Definition

FDs are used to specify formal measures of the

goodness of relational designs keys that are used to define normal forms for

relations constraints that are derived from the meaning and

interrelationships of the data attributes A set of attributes X functionally determines

a set of attributes Y if the value of X determines a unique value for Y

Module 6 26042023

Functional Dependencies

A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If

t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r

depend on or are determined by the values of the X component

The values of the X component functionally determines the values of Y component

FDs are derived from the real-world constraints on the attributes

The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times

Module 6 27042023

Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760

Superior NA 31700 350

Victoria Africa 26828 250

Aral Sea Asia 24904 280

Huron NA 23000 206

Michigan NA 22300 307

Tanganyika Africa 12700 420

Continent -gtName Name -gtLength

Module 6 28042023

Graphical representation of Functional Dependencies

Module 6 29042023

Examples of FD constraints Social security number uniquely determines

employee name SSN -gt ENAME

Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION

Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS

Module 6 30042023

Examples of FD constraints A FD is a property of the attributes in the

schema R not of a particular legal relation state r of R

It must be defined explicitly by someone who knows the semantics of the attributes of R

The constraint must hold on every relation instance r(R)

If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with

t1[K]=t2[K])

Module 6 31042023

Satisfies algorithm

Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B

How it works Sort the tuples of the relation r on the A attributes so

that tuples with equal values under A are next to each other

Check that tuples with equal values under attributes A also have equal values under B

If it meets the condition 2 then the output of the algorithm is true else it is false

Module 6 32042023

Relation state of TEACH

TEACH

TEACHER COURSE TEXT

Teacher Course Text

Smith Data Structures

Bartram

Smith Data Management

Martin

Hall Compilers Hoffmann

Brown ooad Horowitz

TEACHER -gt COURSE

TEXT -gt COURSE

Module 6 33042023

Drawbacks of Satifies algorithm

Using this algorithm is tedious and time consuming

So inference axioms are used

Module 6 34042023

Inference Rules for Functional Dependencies

F is the set of functional dependencies that are specified on relation schema R

Schema designers specifies the most obvious FDs

The other dependencies can be inferred or deduced from FDs in F

Module 6 35042023

Example of Closure Department has one manager (DEPT_NO -gt

MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two

dependencies together imply that (DEPT_NO-gtMGR_PHONE)

This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F

The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)

Module 6 36042023

Example

F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are

SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME

To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y

Module 6 37042023

Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold

whenever the FDs in F hold Armstrongs inference rules

IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ

(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z

IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer

from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]

By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F

Module 6 38042023

Inference Rules for FDs Some additional inference rules that are useful

Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z

The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 21: Module 6

Module 6 21042023

4 Spurious Tuples Bad designs for a relational database may result

in erroneous results for certain JOIN operations The lossless join property is used to

guarantee meaningful results for join operations

GUIDELINE 4 Design relation schemas so that they can be

joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated

Module 6 22042023

Spurious Tuples

There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies

Note that Property (a) is extremely important and cannot be

sacrificed Property (b) is less stringent and may be sacrificed

Module 6 23042023

Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done

during Insertion Modification Deletion

Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values

Generation of invalid and spurious data during joins on improperly related base relations

Module 6 24042023

Functional dependencies Functional dependencies (FDs)

Is a constraint between two sets of attributes from the database

Assumption The entire database is a single universal

relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes

Module 6 25042023

Definition

FDs are used to specify formal measures of the

goodness of relational designs keys that are used to define normal forms for

relations constraints that are derived from the meaning and

interrelationships of the data attributes A set of attributes X functionally determines

a set of attributes Y if the value of X determines a unique value for Y

Module 6 26042023

Functional Dependencies

A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If

t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r

depend on or are determined by the values of the X component

The values of the X component functionally determines the values of Y component

FDs are derived from the real-world constraints on the attributes

The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times

Module 6 27042023

Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760

Superior NA 31700 350

Victoria Africa 26828 250

Aral Sea Asia 24904 280

Huron NA 23000 206

Michigan NA 22300 307

Tanganyika Africa 12700 420

Continent -gtName Name -gtLength

Module 6 28042023

Graphical representation of Functional Dependencies

Module 6 29042023

Examples of FD constraints Social security number uniquely determines

employee name SSN -gt ENAME

Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION

Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS

Module 6 30042023

Examples of FD constraints A FD is a property of the attributes in the

schema R not of a particular legal relation state r of R

It must be defined explicitly by someone who knows the semantics of the attributes of R

The constraint must hold on every relation instance r(R)

If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with

t1[K]=t2[K])

Module 6 31042023

Satisfies algorithm

Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B

How it works Sort the tuples of the relation r on the A attributes so

that tuples with equal values under A are next to each other

Check that tuples with equal values under attributes A also have equal values under B

If it meets the condition 2 then the output of the algorithm is true else it is false

Module 6 32042023

Relation state of TEACH

TEACH

TEACHER COURSE TEXT

Teacher Course Text

Smith Data Structures

Bartram

Smith Data Management

Martin

Hall Compilers Hoffmann

Brown ooad Horowitz

TEACHER -gt COURSE

TEXT -gt COURSE

Module 6 33042023

Drawbacks of Satifies algorithm

Using this algorithm is tedious and time consuming

So inference axioms are used

Module 6 34042023

Inference Rules for Functional Dependencies

F is the set of functional dependencies that are specified on relation schema R

Schema designers specifies the most obvious FDs

The other dependencies can be inferred or deduced from FDs in F

Module 6 35042023

Example of Closure Department has one manager (DEPT_NO -gt

MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two

dependencies together imply that (DEPT_NO-gtMGR_PHONE)

This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F

The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)

Module 6 36042023

Example

F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are

SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME

To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y

Module 6 37042023

Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold

whenever the FDs in F hold Armstrongs inference rules

IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ

(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z

IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer

from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]

By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F

Module 6 38042023

Inference Rules for FDs Some additional inference rules that are useful

Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z

The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 22: Module 6

Module 6 22042023

Spurious Tuples

There are two important properties of decompositions Non-additive or losslessness of the corresponding join Preservation of the functional dependencies

Note that Property (a) is extremely important and cannot be

sacrificed Property (b) is less stringent and may be sacrificed

Module 6 23042023

Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done

during Insertion Modification Deletion

Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values

Generation of invalid and spurious data during joins on improperly related base relations

Module 6 24042023

Functional dependencies Functional dependencies (FDs)

Is a constraint between two sets of attributes from the database

Assumption The entire database is a single universal

relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes

Module 6 25042023

Definition

FDs are used to specify formal measures of the

goodness of relational designs keys that are used to define normal forms for

relations constraints that are derived from the meaning and

interrelationships of the data attributes A set of attributes X functionally determines

a set of attributes Y if the value of X determines a unique value for Y

Module 6 26042023

Functional Dependencies

A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If

t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r

depend on or are determined by the values of the X component

The values of the X component functionally determines the values of Y component

FDs are derived from the real-world constraints on the attributes

The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times

Module 6 27042023

Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760

Superior NA 31700 350

Victoria Africa 26828 250

Aral Sea Asia 24904 280

Huron NA 23000 206

Michigan NA 22300 307

Tanganyika Africa 12700 420

Continent -gtName Name -gtLength

Module 6 28042023

Graphical representation of Functional Dependencies

Module 6 29042023

Examples of FD constraints Social security number uniquely determines

employee name SSN -gt ENAME

Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION

Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS

Module 6 30042023

Examples of FD constraints A FD is a property of the attributes in the

schema R not of a particular legal relation state r of R

It must be defined explicitly by someone who knows the semantics of the attributes of R

The constraint must hold on every relation instance r(R)

If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with

t1[K]=t2[K])

Module 6 31042023

Satisfies algorithm

Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B

How it works Sort the tuples of the relation r on the A attributes so

that tuples with equal values under A are next to each other

Check that tuples with equal values under attributes A also have equal values under B

If it meets the condition 2 then the output of the algorithm is true else it is false

Module 6 32042023

Relation state of TEACH

TEACH

TEACHER COURSE TEXT

Teacher Course Text

Smith Data Structures

Bartram

Smith Data Management

Martin

Hall Compilers Hoffmann

Brown ooad Horowitz

TEACHER -gt COURSE

TEXT -gt COURSE

Module 6 33042023

Drawbacks of Satifies algorithm

Using this algorithm is tedious and time consuming

So inference axioms are used

Module 6 34042023

Inference Rules for Functional Dependencies

F is the set of functional dependencies that are specified on relation schema R

Schema designers specifies the most obvious FDs

The other dependencies can be inferred or deduced from FDs in F

Module 6 35042023

Example of Closure Department has one manager (DEPT_NO -gt

MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two

dependencies together imply that (DEPT_NO-gtMGR_PHONE)

This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F

The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)

Module 6 36042023

Example

F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are

SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME

To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y

Module 6 37042023

Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold

whenever the FDs in F hold Armstrongs inference rules

IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ

(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z

IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer

from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]

By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F

Module 6 38042023

Inference Rules for FDs Some additional inference rules that are useful

Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z

The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 23: Module 6

Module 6 23042023

Summary and Discussion of Design GuidelinesProblems pointed out Anomalies cause redundant work to be done

during Insertion Modification Deletion

Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values

Generation of invalid and spurious data during joins on improperly related base relations

Module 6 24042023

Functional dependencies Functional dependencies (FDs)

Is a constraint between two sets of attributes from the database

Assumption The entire database is a single universal

relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes

Module 6 25042023

Definition

FDs are used to specify formal measures of the

goodness of relational designs keys that are used to define normal forms for

relations constraints that are derived from the meaning and

interrelationships of the data attributes A set of attributes X functionally determines

a set of attributes Y if the value of X determines a unique value for Y

Module 6 26042023

Functional Dependencies

A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If

t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r

depend on or are determined by the values of the X component

The values of the X component functionally determines the values of Y component

FDs are derived from the real-world constraints on the attributes

The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times

Module 6 27042023

Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760

Superior NA 31700 350

Victoria Africa 26828 250

Aral Sea Asia 24904 280

Huron NA 23000 206

Michigan NA 22300 307

Tanganyika Africa 12700 420

Continent -gtName Name -gtLength

Module 6 28042023

Graphical representation of Functional Dependencies

Module 6 29042023

Examples of FD constraints Social security number uniquely determines

employee name SSN -gt ENAME

Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION

Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS

Module 6 30042023

Examples of FD constraints A FD is a property of the attributes in the

schema R not of a particular legal relation state r of R

It must be defined explicitly by someone who knows the semantics of the attributes of R

The constraint must hold on every relation instance r(R)

If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with

t1[K]=t2[K])

Module 6 31042023

Satisfies algorithm

Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B

How it works Sort the tuples of the relation r on the A attributes so

that tuples with equal values under A are next to each other

Check that tuples with equal values under attributes A also have equal values under B

If it meets the condition 2 then the output of the algorithm is true else it is false

Module 6 32042023

Relation state of TEACH

TEACH

TEACHER COURSE TEXT

Teacher Course Text

Smith Data Structures

Bartram

Smith Data Management

Martin

Hall Compilers Hoffmann

Brown ooad Horowitz

TEACHER -gt COURSE

TEXT -gt COURSE

Module 6 33042023

Drawbacks of Satifies algorithm

Using this algorithm is tedious and time consuming

So inference axioms are used

Module 6 34042023

Inference Rules for Functional Dependencies

F is the set of functional dependencies that are specified on relation schema R

Schema designers specifies the most obvious FDs

The other dependencies can be inferred or deduced from FDs in F

Module 6 35042023

Example of Closure Department has one manager (DEPT_NO -gt

MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two

dependencies together imply that (DEPT_NO-gtMGR_PHONE)

This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F

The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)

Module 6 36042023

Example

F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are

SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME

To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y

Module 6 37042023

Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold

whenever the FDs in F hold Armstrongs inference rules

IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ

(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z

IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer

from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]

By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F

Module 6 38042023

Inference Rules for FDs Some additional inference rules that are useful

Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z

The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 24: Module 6

Module 6 24042023

Functional dependencies Functional dependencies (FDs)

Is a constraint between two sets of attributes from the database

Assumption The entire database is a single universal

relation schema R=A1A2hellipAn Where A1A2 hellip are the attributes

Module 6 25042023

Definition

FDs are used to specify formal measures of the

goodness of relational designs keys that are used to define normal forms for

relations constraints that are derived from the meaning and

interrelationships of the data attributes A set of attributes X functionally determines

a set of attributes Y if the value of X determines a unique value for Y

Module 6 26042023

Functional Dependencies

A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If

t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r

depend on or are determined by the values of the X component

The values of the X component functionally determines the values of Y component

FDs are derived from the real-world constraints on the attributes

The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times

Module 6 27042023

Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760

Superior NA 31700 350

Victoria Africa 26828 250

Aral Sea Asia 24904 280

Huron NA 23000 206

Michigan NA 22300 307

Tanganyika Africa 12700 420

Continent -gtName Name -gtLength

Module 6 28042023

Graphical representation of Functional Dependencies

Module 6 29042023

Examples of FD constraints Social security number uniquely determines

employee name SSN -gt ENAME

Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION

Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS

Module 6 30042023

Examples of FD constraints A FD is a property of the attributes in the

schema R not of a particular legal relation state r of R

It must be defined explicitly by someone who knows the semantics of the attributes of R

The constraint must hold on every relation instance r(R)

If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with

t1[K]=t2[K])

Module 6 31042023

Satisfies algorithm

Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B

How it works Sort the tuples of the relation r on the A attributes so

that tuples with equal values under A are next to each other

Check that tuples with equal values under attributes A also have equal values under B

If it meets the condition 2 then the output of the algorithm is true else it is false

Module 6 32042023

Relation state of TEACH

TEACH

TEACHER COURSE TEXT

Teacher Course Text

Smith Data Structures

Bartram

Smith Data Management

Martin

Hall Compilers Hoffmann

Brown ooad Horowitz

TEACHER -gt COURSE

TEXT -gt COURSE

Module 6 33042023

Drawbacks of Satifies algorithm

Using this algorithm is tedious and time consuming

So inference axioms are used

Module 6 34042023

Inference Rules for Functional Dependencies

F is the set of functional dependencies that are specified on relation schema R

Schema designers specifies the most obvious FDs

The other dependencies can be inferred or deduced from FDs in F

Module 6 35042023

Example of Closure Department has one manager (DEPT_NO -gt

MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two

dependencies together imply that (DEPT_NO-gtMGR_PHONE)

This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F

The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)

Module 6 36042023

Example

F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are

SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME

To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y

Module 6 37042023

Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold

whenever the FDs in F hold Armstrongs inference rules

IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ

(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z

IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer

from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]

By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F

Module 6 38042023

Inference Rules for FDs Some additional inference rules that are useful

Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z

The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 25: Module 6

Module 6 25042023

Definition

FDs are used to specify formal measures of the

goodness of relational designs keys that are used to define normal forms for

relations constraints that are derived from the meaning and

interrelationships of the data attributes A set of attributes X functionally determines

a set of attributes Y if the value of X determines a unique value for Y

Module 6 26042023

Functional Dependencies

A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If

t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r

depend on or are determined by the values of the X component

The values of the X component functionally determines the values of Y component

FDs are derived from the real-world constraints on the attributes

The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times

Module 6 27042023

Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760

Superior NA 31700 350

Victoria Africa 26828 250

Aral Sea Asia 24904 280

Huron NA 23000 206

Michigan NA 22300 307

Tanganyika Africa 12700 420

Continent -gtName Name -gtLength

Module 6 28042023

Graphical representation of Functional Dependencies

Module 6 29042023

Examples of FD constraints Social security number uniquely determines

employee name SSN -gt ENAME

Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION

Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS

Module 6 30042023

Examples of FD constraints A FD is a property of the attributes in the

schema R not of a particular legal relation state r of R

It must be defined explicitly by someone who knows the semantics of the attributes of R

The constraint must hold on every relation instance r(R)

If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with

t1[K]=t2[K])

Module 6 31042023

Satisfies algorithm

Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B

How it works Sort the tuples of the relation r on the A attributes so

that tuples with equal values under A are next to each other

Check that tuples with equal values under attributes A also have equal values under B

If it meets the condition 2 then the output of the algorithm is true else it is false

Module 6 32042023

Relation state of TEACH

TEACH

TEACHER COURSE TEXT

Teacher Course Text

Smith Data Structures

Bartram

Smith Data Management

Martin

Hall Compilers Hoffmann

Brown ooad Horowitz

TEACHER -gt COURSE

TEXT -gt COURSE

Module 6 33042023

Drawbacks of Satifies algorithm

Using this algorithm is tedious and time consuming

So inference axioms are used

Module 6 34042023

Inference Rules for Functional Dependencies

F is the set of functional dependencies that are specified on relation schema R

Schema designers specifies the most obvious FDs

The other dependencies can be inferred or deduced from FDs in F

Module 6 35042023

Example of Closure Department has one manager (DEPT_NO -gt

MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two

dependencies together imply that (DEPT_NO-gtMGR_PHONE)

This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F

The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)

Module 6 36042023

Example

F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are

SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME

To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y

Module 6 37042023

Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold

whenever the FDs in F hold Armstrongs inference rules

IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ

(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z

IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer

from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]

By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F

Module 6 38042023

Inference Rules for FDs Some additional inference rules that are useful

Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z

The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 26: Module 6

Module 6 26042023

Functional Dependencies

A functional dependency X -gt Y holds if whenever two tuples have the same value for X they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R) If

t1[X]=t2[X] then t1[Y]=t2[Y] X -gt Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r

depend on or are determined by the values of the X component

The values of the X component functionally determines the values of Y component

FDs are derived from the real-world constraints on the attributes

The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times

Module 6 27042023

Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760

Superior NA 31700 350

Victoria Africa 26828 250

Aral Sea Asia 24904 280

Huron NA 23000 206

Michigan NA 22300 307

Tanganyika Africa 12700 420

Continent -gtName Name -gtLength

Module 6 28042023

Graphical representation of Functional Dependencies

Module 6 29042023

Examples of FD constraints Social security number uniquely determines

employee name SSN -gt ENAME

Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION

Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS

Module 6 30042023

Examples of FD constraints A FD is a property of the attributes in the

schema R not of a particular legal relation state r of R

It must be defined explicitly by someone who knows the semantics of the attributes of R

The constraint must hold on every relation instance r(R)

If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with

t1[K]=t2[K])

Module 6 31042023

Satisfies algorithm

Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B

How it works Sort the tuples of the relation r on the A attributes so

that tuples with equal values under A are next to each other

Check that tuples with equal values under attributes A also have equal values under B

If it meets the condition 2 then the output of the algorithm is true else it is false

Module 6 32042023

Relation state of TEACH

TEACH

TEACHER COURSE TEXT

Teacher Course Text

Smith Data Structures

Bartram

Smith Data Management

Martin

Hall Compilers Hoffmann

Brown ooad Horowitz

TEACHER -gt COURSE

TEXT -gt COURSE

Module 6 33042023

Drawbacks of Satifies algorithm

Using this algorithm is tedious and time consuming

So inference axioms are used

Module 6 34042023

Inference Rules for Functional Dependencies

F is the set of functional dependencies that are specified on relation schema R

Schema designers specifies the most obvious FDs

The other dependencies can be inferred or deduced from FDs in F

Module 6 35042023

Example of Closure Department has one manager (DEPT_NO -gt

MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two

dependencies together imply that (DEPT_NO-gtMGR_PHONE)

This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F

The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)

Module 6 36042023

Example

F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are

SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME

To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y

Module 6 37042023

Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold

whenever the FDs in F hold Armstrongs inference rules

IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ

(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z

IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer

from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]

By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F

Module 6 38042023

Inference Rules for FDs Some additional inference rules that are useful

Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z

The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 27: Module 6

Module 6 27042023

Lakes of the worldName Continent Area lengthCaspian Sea Asia-Europe 143244 760

Superior NA 31700 350

Victoria Africa 26828 250

Aral Sea Asia 24904 280

Huron NA 23000 206

Michigan NA 22300 307

Tanganyika Africa 12700 420

Continent -gtName Name -gtLength

Module 6 28042023

Graphical representation of Functional Dependencies

Module 6 29042023

Examples of FD constraints Social security number uniquely determines

employee name SSN -gt ENAME

Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION

Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS

Module 6 30042023

Examples of FD constraints A FD is a property of the attributes in the

schema R not of a particular legal relation state r of R

It must be defined explicitly by someone who knows the semantics of the attributes of R

The constraint must hold on every relation instance r(R)

If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with

t1[K]=t2[K])

Module 6 31042023

Satisfies algorithm

Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B

How it works Sort the tuples of the relation r on the A attributes so

that tuples with equal values under A are next to each other

Check that tuples with equal values under attributes A also have equal values under B

If it meets the condition 2 then the output of the algorithm is true else it is false

Module 6 32042023

Relation state of TEACH

TEACH

TEACHER COURSE TEXT

Teacher Course Text

Smith Data Structures

Bartram

Smith Data Management

Martin

Hall Compilers Hoffmann

Brown ooad Horowitz

TEACHER -gt COURSE

TEXT -gt COURSE

Module 6 33042023

Drawbacks of Satifies algorithm

Using this algorithm is tedious and time consuming

So inference axioms are used

Module 6 34042023

Inference Rules for Functional Dependencies

F is the set of functional dependencies that are specified on relation schema R

Schema designers specifies the most obvious FDs

The other dependencies can be inferred or deduced from FDs in F

Module 6 35042023

Example of Closure Department has one manager (DEPT_NO -gt

MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two

dependencies together imply that (DEPT_NO-gtMGR_PHONE)

This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F

The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)

Module 6 36042023

Example

F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are

SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME

To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y

Module 6 37042023

Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold

whenever the FDs in F hold Armstrongs inference rules

IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ

(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z

IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer

from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]

By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F

Module 6 38042023

Inference Rules for FDs Some additional inference rules that are useful

Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z

The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 28: Module 6

Module 6 28042023

Graphical representation of Functional Dependencies

Module 6 29042023

Examples of FD constraints Social security number uniquely determines

employee name SSN -gt ENAME

Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION

Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS

Module 6 30042023

Examples of FD constraints A FD is a property of the attributes in the

schema R not of a particular legal relation state r of R

It must be defined explicitly by someone who knows the semantics of the attributes of R

The constraint must hold on every relation instance r(R)

If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with

t1[K]=t2[K])

Module 6 31042023

Satisfies algorithm

Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B

How it works Sort the tuples of the relation r on the A attributes so

that tuples with equal values under A are next to each other

Check that tuples with equal values under attributes A also have equal values under B

If it meets the condition 2 then the output of the algorithm is true else it is false

Module 6 32042023

Relation state of TEACH

TEACH

TEACHER COURSE TEXT

Teacher Course Text

Smith Data Structures

Bartram

Smith Data Management

Martin

Hall Compilers Hoffmann

Brown ooad Horowitz

TEACHER -gt COURSE

TEXT -gt COURSE

Module 6 33042023

Drawbacks of Satifies algorithm

Using this algorithm is tedious and time consuming

So inference axioms are used

Module 6 34042023

Inference Rules for Functional Dependencies

F is the set of functional dependencies that are specified on relation schema R

Schema designers specifies the most obvious FDs

The other dependencies can be inferred or deduced from FDs in F

Module 6 35042023

Example of Closure Department has one manager (DEPT_NO -gt

MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two

dependencies together imply that (DEPT_NO-gtMGR_PHONE)

This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F

The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)

Module 6 36042023

Example

F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are

SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME

To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y

Module 6 37042023

Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold

whenever the FDs in F hold Armstrongs inference rules

IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ

(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z

IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer

from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]

By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F

Module 6 38042023

Inference Rules for FDs Some additional inference rules that are useful

Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z

The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 29: Module 6

Module 6 29042023

Examples of FD constraints Social security number uniquely determines

employee name SSN -gt ENAME

Project number uniquely determines project name and location PNUMBER -gt PNAME PLOCATION

Employee ssn and project number uniquely determines the hours per week that the employee works on the project SSN PNUMBER -gt HOURS

Module 6 30042023

Examples of FD constraints A FD is a property of the attributes in the

schema R not of a particular legal relation state r of R

It must be defined explicitly by someone who knows the semantics of the attributes of R

The constraint must hold on every relation instance r(R)

If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with

t1[K]=t2[K])

Module 6 31042023

Satisfies algorithm

Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B

How it works Sort the tuples of the relation r on the A attributes so

that tuples with equal values under A are next to each other

Check that tuples with equal values under attributes A also have equal values under B

If it meets the condition 2 then the output of the algorithm is true else it is false

Module 6 32042023

Relation state of TEACH

TEACH

TEACHER COURSE TEXT

Teacher Course Text

Smith Data Structures

Bartram

Smith Data Management

Martin

Hall Compilers Hoffmann

Brown ooad Horowitz

TEACHER -gt COURSE

TEXT -gt COURSE

Module 6 33042023

Drawbacks of Satifies algorithm

Using this algorithm is tedious and time consuming

So inference axioms are used

Module 6 34042023

Inference Rules for Functional Dependencies

F is the set of functional dependencies that are specified on relation schema R

Schema designers specifies the most obvious FDs

The other dependencies can be inferred or deduced from FDs in F

Module 6 35042023

Example of Closure Department has one manager (DEPT_NO -gt

MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two

dependencies together imply that (DEPT_NO-gtMGR_PHONE)

This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F

The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)

Module 6 36042023

Example

F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are

SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME

To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y

Module 6 37042023

Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold

whenever the FDs in F hold Armstrongs inference rules

IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ

(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z

IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer

from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]

By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F

Module 6 38042023

Inference Rules for FDs Some additional inference rules that are useful

Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z

The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 30: Module 6

Module 6 30042023

Examples of FD constraints A FD is a property of the attributes in the

schema R not of a particular legal relation state r of R

It must be defined explicitly by someone who knows the semantics of the attributes of R

The constraint must hold on every relation instance r(R)

If K is a key of R then K functionally determines all attributes in R (since we never have two distinct tuples with

t1[K]=t2[K])

Module 6 31042023

Satisfies algorithm

Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B

How it works Sort the tuples of the relation r on the A attributes so

that tuples with equal values under A are next to each other

Check that tuples with equal values under attributes A also have equal values under B

If it meets the condition 2 then the output of the algorithm is true else it is false

Module 6 32042023

Relation state of TEACH

TEACH

TEACHER COURSE TEXT

Teacher Course Text

Smith Data Structures

Bartram

Smith Data Management

Martin

Hall Compilers Hoffmann

Brown ooad Horowitz

TEACHER -gt COURSE

TEXT -gt COURSE

Module 6 33042023

Drawbacks of Satifies algorithm

Using this algorithm is tedious and time consuming

So inference axioms are used

Module 6 34042023

Inference Rules for Functional Dependencies

F is the set of functional dependencies that are specified on relation schema R

Schema designers specifies the most obvious FDs

The other dependencies can be inferred or deduced from FDs in F

Module 6 35042023

Example of Closure Department has one manager (DEPT_NO -gt

MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two

dependencies together imply that (DEPT_NO-gtMGR_PHONE)

This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F

The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)

Module 6 36042023

Example

F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are

SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME

To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y

Module 6 37042023

Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold

whenever the FDs in F hold Armstrongs inference rules

IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ

(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z

IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer

from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]

By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F

Module 6 38042023

Inference Rules for FDs Some additional inference rules that are useful

Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z

The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 31: Module 6

Module 6 31042023

Satisfies algorithm

Why it is used To determine whether a relation r satisfies or does not satisfy a given functional dependency A B

How it works Sort the tuples of the relation r on the A attributes so

that tuples with equal values under A are next to each other

Check that tuples with equal values under attributes A also have equal values under B

If it meets the condition 2 then the output of the algorithm is true else it is false

Module 6 32042023

Relation state of TEACH

TEACH

TEACHER COURSE TEXT

Teacher Course Text

Smith Data Structures

Bartram

Smith Data Management

Martin

Hall Compilers Hoffmann

Brown ooad Horowitz

TEACHER -gt COURSE

TEXT -gt COURSE

Module 6 33042023

Drawbacks of Satifies algorithm

Using this algorithm is tedious and time consuming

So inference axioms are used

Module 6 34042023

Inference Rules for Functional Dependencies

F is the set of functional dependencies that are specified on relation schema R

Schema designers specifies the most obvious FDs

The other dependencies can be inferred or deduced from FDs in F

Module 6 35042023

Example of Closure Department has one manager (DEPT_NO -gt

MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two

dependencies together imply that (DEPT_NO-gtMGR_PHONE)

This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F

The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)

Module 6 36042023

Example

F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are

SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME

To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y

Module 6 37042023

Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold

whenever the FDs in F hold Armstrongs inference rules

IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ

(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z

IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer

from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]

By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F

Module 6 38042023

Inference Rules for FDs Some additional inference rules that are useful

Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z

The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 32: Module 6

Module 6 32042023

Relation state of TEACH

TEACH

TEACHER COURSE TEXT

Teacher Course Text

Smith Data Structures

Bartram

Smith Data Management

Martin

Hall Compilers Hoffmann

Brown ooad Horowitz

TEACHER -gt COURSE

TEXT -gt COURSE

Module 6 33042023

Drawbacks of Satifies algorithm

Using this algorithm is tedious and time consuming

So inference axioms are used

Module 6 34042023

Inference Rules for Functional Dependencies

F is the set of functional dependencies that are specified on relation schema R

Schema designers specifies the most obvious FDs

The other dependencies can be inferred or deduced from FDs in F

Module 6 35042023

Example of Closure Department has one manager (DEPT_NO -gt

MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two

dependencies together imply that (DEPT_NO-gtMGR_PHONE)

This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F

The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)

Module 6 36042023

Example

F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are

SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME

To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y

Module 6 37042023

Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold

whenever the FDs in F hold Armstrongs inference rules

IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ

(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z

IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer

from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]

By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F

Module 6 38042023

Inference Rules for FDs Some additional inference rules that are useful

Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z

The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 33: Module 6

Module 6 33042023

Drawbacks of Satifies algorithm

Using this algorithm is tedious and time consuming

So inference axioms are used

Module 6 34042023

Inference Rules for Functional Dependencies

F is the set of functional dependencies that are specified on relation schema R

Schema designers specifies the most obvious FDs

The other dependencies can be inferred or deduced from FDs in F

Module 6 35042023

Example of Closure Department has one manager (DEPT_NO -gt

MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two

dependencies together imply that (DEPT_NO-gtMGR_PHONE)

This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F

The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)

Module 6 36042023

Example

F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are

SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME

To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y

Module 6 37042023

Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold

whenever the FDs in F hold Armstrongs inference rules

IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ

(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z

IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer

from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]

By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F

Module 6 38042023

Inference Rules for FDs Some additional inference rules that are useful

Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z

The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 34: Module 6

Module 6 34042023

Inference Rules for Functional Dependencies

F is the set of functional dependencies that are specified on relation schema R

Schema designers specifies the most obvious FDs

The other dependencies can be inferred or deduced from FDs in F

Module 6 35042023

Example of Closure Department has one manager (DEPT_NO -gt

MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two

dependencies together imply that (DEPT_NO-gtMGR_PHONE)

This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F

The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)

Module 6 36042023

Example

F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are

SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME

To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y

Module 6 37042023

Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold

whenever the FDs in F hold Armstrongs inference rules

IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ

(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z

IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer

from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]

By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F

Module 6 38042023

Inference Rules for FDs Some additional inference rules that are useful

Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z

The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 35: Module 6

Module 6 35042023

Example of Closure Department has one manager (DEPT_NO -gt

MGR_SSN) Manager has a unique phone number (MGR_SSN-gtMGR_PHONE) then these two

dependencies together imply that (DEPT_NO-gtMGR_PHONE)

This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F

The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+)

Module 6 36042023

Example

F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are

SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME

To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y

Module 6 37042023

Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold

whenever the FDs in F hold Armstrongs inference rules

IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ

(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z

IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer

from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]

By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F

Module 6 38042023

Inference Rules for FDs Some additional inference rules that are useful

Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z

The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 36: Module 6

Module 6 36042023

Example

F=SSN ENAME BDATE ADDRESS DNUMBERDNUMBER DNAME DMGRSSN The inferred functional dependencies are

SSN DNAME DMGRSSN SSN SSN DNUMBER DNAME

To determine a systematic way to infer dependencies a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies This is denoted by F|=X Y

Module 6 37042023

Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold

whenever the FDs in F hold Armstrongs inference rules

IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ

(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z

IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer

from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]

By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F

Module 6 38042023

Inference Rules for FDs Some additional inference rules that are useful

Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z

The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 37: Module 6

Module 6 37042023

Inference Rules for FDs Given a set of FDs F we can infer additional FDs that hold

whenever the FDs in F hold Armstrongs inference rules

IR1 (Reflexive) If Y subset-of X then X -gt Y IR2 (Augmentation) If X -gt Y then XZ -gt YZ

(Notation XZ stands for X U Z) IR3 (Transitive) If X -gt Y and Y -gt Z then X -gt Z

IR1 IR2 IR3 form a sound and complete set of inference rules By sound we meanany dependency that we can infer

from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies]

By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F

Module 6 38042023

Inference Rules for FDs Some additional inference rules that are useful

Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z

The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 38: Module 6

Module 6 38042023

Inference Rules for FDs Some additional inference rules that are useful

Decomposition If X -gt YZ then X -gt Y and X -gt Z Union or additive If X -gt Y and X -gt Z then X -gt YZ Psuedotransitive If X -gt Y and WY -gt Z then WX -gt Z

The last three inference rules as well as any other inference rules can be deduced from IR1 IR2 and IR3 (completeness property)

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 39: Module 6

Module 6 39042023

Examples

1 Given the set F=ABCX BXZ derive ACZ using the inference axioms

2 Given F=AB CD with C subset of B show that F|=AD

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 40: Module 6

Module 6 40042023

Redundant functional dependencies Given a set F of FDs a FD AB of F is said to

be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-AB

Redundant FDs are extra and unnecessary and can be safely removed from the set F

Eliminating redundant FDs allows us to minimize the set of FDs

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 41: Module 6

Module 6 41042023

Equivalence of Sets of Functional Dependencies

A set of FD F is said to cover another set of FDs E if every FD in E is also in F+ If every dependency in E can be inferred from F alternatively E is covered by F

Two sets of FDs E and F are equivalent if E+ = F+ Hence equivalence means that every FD in E can be inferred from F and every FD in F can be inferred from E E is equivalent to F if both the conditions E covers F and F covers E hold

For a given set F of FDs the set F+ may contain a large number of FDs It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+ Sets of FDs that satisfy this condition are said to be equivalent sets

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 42: Module 6

Module 6 42042023

Minimal Functional Dependencies (minimal cover ) is useful in eliminating unnecessary

functional dependencies Also called as Irreducibe Set of F F is transformed such that each FD in it

that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 43: Module 6

Module 6 43042023

Minimal cover

(a) every RHS of each dependency is a single attribute

(b) for no X -gt A in F is the set F - X -gt A equivalent to F

(c) for no X -gt A in F and proper subset Z of X is F - X -gt A U Z -gt A equivalent to F

no redundanc

ies

no dependencies may be replaced by a dependency

that involves a subset of the left hand side

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 44: Module 6

Module 6 44042023

Extraneous Attributes

Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F

F be a set of FDs over schema R and let A1A2B1B2

A1 is extraneous iff FΞF-A1A2B1B2UA2B1B2

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 45: Module 6

Module 6 46042023

CANONICAL COVER (FC)

1 Every FD of FC is simple RHS has one attribute

2 FC is left-reduced

3 FC is nonredudant

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 46: Module 6

Module 6 47042023

Problem

Given a set F of FDs find a cononical cover for F

FC = XZ XYWP XYZWQ XZR

1 FC= XZ XYW XYP XYZ XYW XYQ XZR

2 FC = XZ XYW XYP XYQ XZR

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 47: Module 6

Module 6 48042023

Normal Forms Based on Primary Keys 1 Normalization of Relations

2 Practical Use of Normal Forms

3 Definitions of Keys and Attributes participating in Keys

4 First Normal Form

5 Second Normal Form

6 Third Normal Form

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 48: Module 6

Module 6 49042023

Normalization of Relations

2NF 3NF BCNF based on keys and FDs of a relation schema (relational design by analysis)

4NF based on keys multi-valued dependencies MVDs 5NF based on keys join dependencies JDs (relational design by synthesis)

Additional properties may be needed to ensure a good relational design lossless join and dependency preservation

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 49: Module 6

Module 6 50042023

Normalization of Relations

Proposed by Codd Normalizationanalysing the given relation based on their FDs and

primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies

Provides the database designer with Formal framework for analyzing relation schemas based on keys

and FD Series of normal form tests

Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 50: Module 6

Module 6 51042023

Normalization of Relations

Lossless join or nonadditive join property it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition

Dependency preservation property it ensures that each functional dependency is represented in some individual relation resulting after decomposition

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 51: Module 6

Module 6 52042023

Practical Use of Normal Forms Normalization is carried out in practice so that

the resulting designs are of high quality and meet the desirable properties

The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect

The database designers need not normalize to the highest possible normal form (usually up to 3NF BCNF or 4NF)

Denormalization the process of storing the join of higher normal form relations as a base relationmdashwhich is in a lower normal form

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 52: Module 6

Module 6 53042023

Definitions of Keys and Attributes Participating in Keys

A superkey of a relation schema R = A1 A2

An is a set of attributes S subset-of R with the

property that no two tuples t1 and t2 in any legal

relation state r of R will have t1[S] = t2[S]

A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 53: Module 6

Module 6 54042023

Definitions of Keys and Attributes Participating in Keys

If a relation schema has more than one key each is called a candidate key One of the candidate keys is arbitrarily designated to be the primary key and the others are called secondary keys

A Prime attribute must be a member of some candidate key

A Nonprime attribute is not a prime attributemdashthat is it is not a member of any candidate key

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 54: Module 6

Module 6 55042023

First Normal Form

Disallows composite attributes multivalued attributes and nested relations attributes whose values for an individual tuple are non-atomic

Hence 1NF disallows relations within relations or relations as attribute values within tuples

Considered to be part of the definition of relation

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 55: Module 6

Module 6 56042023

Normalization into 1NF

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 56: Module 6

Module 6 57042023

Normalization into 1NF To achieve 1NF there are 3 techniques1 Remove the attribute that violates 1NF and place it in

a separate relation along with the primary key2 Expand the key so that there will be a separate tuple

in the original relation It has disadvantage of introducing redundancy

3 If max no of values is known for an attribute than replace each attribute with that many no of atomic attributes It has disadvantage of introducing NULL values

1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no of values

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 57: Module 6

Module 6 58042023

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 58: Module 6

Module 6 59042023

Normalization nested relations into 1NF

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 59: Module 6

Additional problems from schaum series Pg 178 51

Module 6 60042023

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 60: Module 6

Module 6 61042023

Second Normal Form Uses the concepts of FDs primary key

Definitions Prime attribute - attribute that is member of the

primary key K Full functional dependency - a FD Y -gt Z

where removal of any attribute from Y means the FD does not hold any moreExamples - SSN PNUMBER -gt HOURS is a full FD since neither SSN -gt HOURS nor PNUMBER -gt HOURS hold

- SSN PNUMBER -gt ENAME is not a full FD (it is called a partial dependency ) since SSN -gt ENAME also holds

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 61: Module 6

Module 6 62042023

Second Normal Form

A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key

R can be decomposed into 2NF relations via the process of 2NF normalization

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 62: Module 6

Module 6 63042023

Normalizing into 2NF

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 63: Module 6

Conversion to 2NF

A A A

B B D

C C

D

Module 6 64042023

Convert to

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 64: Module 6

Additional problem on 2nd normal formProg_task(prog_ID Prog_Pack_ID

prog_Pac_name Tot-Hours-wor)

Prog_Pack_IDProg_Pac_name

1 What is the highest normal form

2 Transform into next highest form

Module 6 65042023

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 65: Module 6

Module 6 66042023

Third Normal Form

Definition Transitive functional dependency - a FD X -gt

Z that can be derived from two FDs X -gt Y and Y -gt Z Examples

- SSN -gt DMGRSSN is a transitive FD sinceSSN -gt DNUMBER and DNUMBER -gt DMGRSSN hold - SSN -gt ENAME is non-transitive since there is no set of attributes X where SSN -gt X and X -gt ENAME

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 66: Module 6

Module 6 67042023

Third Normal Form A relation schema R is in third normal form (3NF) if it is

in 2NF and no non-prime attribute A in R is transitively dependent on the primary key

R can be decomposed into 3NF relations via the process of 3NF normalization

NOTE

In X -gt Y and Y -gt Z with X as the primary key we consider this a problem only if Y is not a candidate key When Y is a candidate key there is no problem with the transitive dependency Eg Consider EMP (SSN Emp Salary ) Here SSN -gt Emp -gt Salary and Emp is a candidate key

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 67: Module 6

Module 6 68042023

Normalization into 3NF

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 68: Module 6

Module 6 69042023

Normalizing into 2NF and 3NF

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 69: Module 6

Module 6 70042023

SUMMARY

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 70: Module 6

Module 6 71042023

Normalize the following relation

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 71: Module 6

Module 6 73042023

Normalization into 2NF

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 72: Module 6

Module 6 75042023

Normalization into 3NF

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 73: Module 6

Additional problems

Pg 186513

Module 6 76042023

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 74: Module 6

Module 6 78042023

Boyce-Codd normal form

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 75: Module 6

Module 6 79042023

BCNF (Boyce-Codd Normal Form) A relation schema R is in Boyce-Codd Normal

Form (BCNF) if whenever an FD X -gt A holds in R then X is a superkey of R

Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF

There exist relations that are in 3NF but not in BCNF

The goal is to have each relation in BCNF (or 3NF)

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 76: Module 6

Module 6 80042023

How is BCNF different from 3NF

For a FD XA the 3NF allows this dependency in a relation if lsquoArsquo is a primary-key attribute and lsquoXrsquo is not a candidate key

To test whether a relation is in BCNF lsquoXrsquo must be a candidate key

So relation in BCNF will definitely be in 3NF but not the other way around

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 77: Module 6

Module 6 81042023

A relation TEACH that is in 3NF but not in BCNF

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 78: Module 6

Module 6 82042023

Achieving the BCNF by Decomposition Two FDs exist in the relation TEACH

fd1 student course -gt instructor fd2 instructor -gt course

student course is a candidate key for this relation So this relation is in 3NF but not in BCNF

A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property while possibly forgoing the preservation of all functional dependencies in the decomposed relations

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 79: Module 6

Module 6 83042023

Achieving the BCNF by Decomposition Three possible decompositions for relation TEACH

student instructor and student course course instructor and course student instructor course and instructor student

All three decompositions will lose fd1 We have to settle for sacrificing the functional dependency

preservation But we cannot sacrifice the non-additivity property after decomposition

Out of the above three only the 3rd decomposition will not generate spurious tuples after join(and hence has the non-additivity property)

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 80: Module 6

Lossless or lossy decompositions When we decompose a relation we need to

make sure that we can recover the original relation from the new relations that have replaced it

If we can recover the original relation then the decomposition is lossless else it is lossy

Example 511 pg 162

Module 6 86042023

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 81: Module 6

Testing for lossless joins

Lossless join algorithm Example 512

Module 6 87042023

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 82: Module 6

Module 6 89042023

Fourth Normal Form (4NF) Multi-valued dependency (MVD) Represents a dependency between attributes (for

example AB and C) in a relation such that for each value of A there is a set of values for B and a set of value for C However the set of values for B and C are independent of each other

A multi-valued dependency can be further defined as being trivial or nontrivial

A MVD A-gt-gt B in relation R is defined as being trivial if B is a subset of A or A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 83: Module 6

Module 6 90042023

Fourth Normal Form (4NF) Fourth normal form (4NF)

A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies

It is used for removing multivalued dependency

In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 84: Module 6

Module 6 91042023

Multivalued Dependencies and Fourth Normal Form The EMP relation with two MVDs ENAME mdashgtgt PNAME and

ENAME mdashgtgt DNAME Decomposing the EMP relation into two 4NF relations

EMP_PROJECTS and EMP_DEPENDENTS

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 85: Module 6

Module 6 92042023

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 86: Module 6

Module 6 93042023

Fifth Normal Form (5NF) Join dependency

Describes a type of dependency For example for a relation R with subsets of the attributes of R denoted as A B hellip Z a relation R satisfies a join dependency if and only if every legal value of R is equal to the join of its projections on A B hellip Z

Lossless-join dependency A property of decomposition which ensures that

no spurious tuples are generated when relations are reunited through a natural join operation

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 87: Module 6

Module 6 94042023

Fifth Normal Form (5NF)

Definition1048698 A relation schema R is in fifth normal form (5NF) (orProject-Join Normal Form (PJNF)) with respect to aset F of functional multivalued and join dependenciesif for every nontrivial join dependency JD(R1 R2 Rn) in F+ (that is implied by F) every Ri is a superkeyof R In other words A relation that has no join

dependency

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 88: Module 6

Module 6 95042023

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has

the JD(R1 R2 R3)

(d) Decomposing the relation SUPPLY into the 5NF relations R1 R2 and R3

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice

Page 89: Module 6

Module 6 96042023

Fifth Normal Form (5NF)

Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes

Hence 5NF is rarely used in practice