ISOM MIS710 Module 1b Relational Model and Normalization Arijit Sengupta

May 23, 2018

The Entity-Relationship ModelEntity Integrity Rule

Referential Integrity Rule

Unnormalization

ISOM

Understand the notion of keys

Understand the use and importance of referential integrity

Provide an alternative way to design relations using semantics rather than concepts

Take an existing “flat file” design and creating a relational design from it through the process of Normalization

Identify sources of problems (or anomalies) within a given relational design

Argue about improvements to designs created by others

ISOM

Based on mathematical set theory

Tuples

Attributes

Attribute

Values

A tuple is a set of attribute-value properties (relations)

Ordering of attributes is immaterial

Ordering of Tuples is immaterial

Tuples are distinct from one another

Attributes contain atomic values only

Sheet1

Emp#

Name

Address

E1

ISOM

Attributes

Attribute domain

Domain (GPA) =

Domain (name) =

Domain (DateOfBirth) =

Domain (year)

ISOM

Tuples

Cardinality: Number of tuples in a relation

What is the difference between the cardinality and the degree?

Sheet1

ID

Name

Age

Address

GPA

S1

Jose

21

Primary Keys

Superkey: SK, a subset of attributes of R, satisfying Uniqueness, that is, no two tuples have the same combination of values for these attributes

Candidate Key: K, a superkey SK, satisfying minimality, that is, no component of K can be eliminated without destroying the uniqueness property.

Primary Key: PK, the selected Candidate key, K.

Can a primary key be composed of multiple attributes?

Can a relation have multiple primary keys?

ISOM

ISOM

Entity Integrity Rule

The primary key of a base relation cannot contain a NULL value.

Enforcement of the rule:

An update which results in a NULL value in the primary key must be rejected.

Are the following ok?

Club (ID, Name, …) Player (ID, Name, ?*, …)

Order (OrdID, Date, …, ?*) Customer (ID, Name, …, ?*)

Dept (DeptID, Name, …, ?*) Employee (EID, Name, …, ?*)

Attribute(s) of one relation that reference(s) the PK of another relation

FK may or may not be (a part of) the PK of this relation

Course (CourseID, Name, …, ?*) Class (ClassID, Meets, …, ?*)

Student (SID, Name, …, ?*) Registration (?)

Can an FK refer to a part of the PK of another relation?

Can an FK refer to a PK of the same relation?

ISOM

FK and referenced PK may have different names

The values of FK must draw from the value set of PK

How do we define the Domain of an FK?

Can an FK have a NULL value?

What can we enforce with PKs and FKs?

Domain

Referential Integrity Rule

If FK is the foreign key of a relation R2, which matches the primary key PK of the relation R1, then:

the FK value must match the PK value in some tuple of R1, or

the FK value may be NULL, but only if the FK is not (a part of) the PK of R2.

Enforcement of the Rule

An update on either a referenced PK or an FK must satisfy the rule. Otherwise, the operation is rejected.

Which operation on the primary key may violate this rule?

Which operation on the foreign key may violate this rule?

ISOM

Restrict

Cascade

try to propagate the operation to all dependent FK values, if it is not possible, reject the operation

Nullify (or Default)

set all dependent FK values to NULL (or a default value), if that is not possible, reject the operation

Cases for each of the above situations?

ISOM

Name char(30) not null,

coursename char(30) not null,

Can we insert a new course# with a new textbook?

What should be done if ‘CIS’ is changed to ‘MIS’?

What would happen if we remove all CIS 800 students?

Sheet1

SID

Name

Grade

Course#

Text

Major

Dept

s1

Joseph

A

CIS800

b1

CIS

CIS

s1

Joseph

B

CIS820

b2

CIS

CIS

s1

Joseph

A

CIS872

b5

CIS

CIS

s2

Alice

A

CIS800

b1

CS

MCS

s2

Alice

A

CIS872

b5

CS

MCS

s3

Tom

B

CIS800

b1

Acct

Acct

s3

Tom

B

CIS872

b5

Acct

Acct

s3

Tom

A

CIS860

b1

Acct

Acct

ISOM

Poor Relation Design causes Anomalies

Insertion anomalies - Insertion of some piece of information cannot be performed unless other irrelevant information is added to it.

Update anomalies - Update of a single piece of information requires updates to multiple tuples.

Deletion anomalies - Deletion of a piece of information removes other unrelated but necessary information.

Normalization improves the design to remove these anomalies

ISOM

contain minimum amount of redundancy

allow users to insert, delete and modify tuples in the relation without errors or inconsistencies.

improve quality of information in the database

decrease storage space for the database

Costs

may require more storage in some cases

ISOM

Do you see any problems in the definition?

Do you see any anomalies in the data?

Sheet1

STUDENT

STUDENT

COURSE

COURSE

INSTR

ROOM

CREDITS

GRADE

ID

NAME

ID

NAME

NAME

224

Waters

CIS20

Functional Dependency (FD)

Consider two attributes, X and Y, and two arbitrary tuples r1 and r2 of a relation R.

Y is functionally dependent on X iff:

value of x in r1 = value of x in r2

implies

value of Y in r1 = value of Y in r2

Also stated as: R.X R.Y or X Y

ISOM

X is called the determinant of Y.

X may or may not be the key attribute of R.

A FD changes with its semantic meaning

Name Address?

X and Y may be mutually dependent on each other

Husband Wife, Wife Husband

Course# Text

When is X Y a FFD?

When Y is not functionally dependent on any proper subset of X

X Y is a fully functional dependency ( FFD )

( SID, Course# ) Name? ( SID, Course# ) Grade?

( SID, Name ) Major? ( SID, Name ) SID?

By default, the term FD refers to FFD

ISOM

Given attributes X, Y, and Z of a relation R,

Z is transitively dependent on X (X Z)

iff X Y and Y Z

For example:

ISOM

Some Inference Rules for FDs

An FD is redundant if it can be derived from other FDs based on a set of inference rules. Some of these rules are:

Reflexive rule: If X Y, then X Y

X always determines a subset of itself.

Augmentation rule: If X Y, then XZ YZ

Adding an attribute(s) on both side does not change the FD.

Transitive rule: If X Y & Y Z, then X Z

Functional dependencies can be ‘chained’.

Decomposition rule: If X YZ, then X Y and X Z

Given: { SID Name, SID Major, Major Dept }, which ones is/are redundant?

SID School, SID Dept, Dept School

SID ( Name, Major ), (SID, Name) (Major, Name)

SID SID, SID (Name, SID)

ISOM

DEFINITION

A relation R is in first normal form (1NF) if and only if all underlying domains contain atomic values only.

Translation

To be in first normal form the table must not contain any repeating attributes.

Implication

ISOM

What are the PKs?

Insertion Anomaly

A new course cannot be inserted in the database (relation Student-Course) until a student registers for that course.

Update Anomaly

If the instructor of a course is changed, this fact would have to be noted at many places in the database (many tuples of the relation Student-Course).

Deletion Anomaly

Withdrawal of all students from an existing course (that is, deletion of related tuples from the relation Student-Course) will result in unwarranted removal of that course from the database.

ISOM

1NF Relations have anomalies

DEFINITION

A relation R is in second normal form (2NF) if and only if it is in 1NF and every nonkey attribute is dependent on the full primary key.

Translation

A table is in second normal form if there are no partial dependencies.

Implication

What kinds of primary keys may lead to a violation of the Second Normal Form (2NF) ?

ISOM

Insertion anomaly

Information about a faculty (potential advisor) cannot be added to the database unless a student is assigned to him/her.

Update anomaly

If the advisor’s office location or phone were changed, many tuples would need to be changed.

Deletion anomaly

If all students assigned to an advisor graduate, information about the advisor will disappear from the database.

Sheet1

STUDENT

STUDENT

STATUS

ADVISOR

ADVISOR

ADVISOR

TOTAL

ID

NAME

OFFICE

PHONE

CREDITS

224

Waters

Junior

Young

CBA221

726104

105

351

Byron

Soph

Greene

CBA215

718434

77

421

Smith

Junior

Young

CBA221

726104

97

ISOM

DEFINITION

A relation R is in third normal form (3NF) if and only if it is in 2NF and every nonkey attribute is non-transitively dependent on the primary key.

Translation

A table is in Third Normal Form if every non-key attribute is determined by the key, and nothing else.

Implication

How many total attributes must the relation have for a possible violation of the Third Normal Form (3NF) ?

ISOM

StudentName

Status

TotalCredits

AdvisorOffice

AdvisorPhone

StudentId

Advisor

Advisor

ISOM

R has multiple candidate keys,

Those candidate keys are composite, and

The candidate keys are overlapped.

Computer-Lab (SID, Account, Class, Hours)

A relation R is in BCNF iff every determinant is a candidate key.

ISOM

Flatten the Table Completely (no composite columns)

Find the Key and “all” FDs (well as many as you can possibly detect)

Find Partial Dependencies and decompose relation using them (2NF)

Find Transitive dependencies and decompose using them (3NF)

Remember – this is not a deterministic method – depends on the order in which FDs are chosen, so same Relation, same set of FDs can lead to different decompositions!

ISOM

In a good decomposition

Decomposed relations can be maintained independently

Rissanen’s rule for non-loss decomposition: Two projections R1 and R2 of a relation R are independent iff:

Every FD in R can be logically deduced from those in R 1 and R 2 , and

The common attributes of R 1 and R 2 form a candidate key for at least one of the pair.

ISOM

ISOM

CourseSectionMeetsEnrolled

2011MW20

201NULLTTh25

NULLNULLMWF18

Referential Integrity Rule

Unnormalization

ISOM

Understand the notion of keys

Understand the use and importance of referential integrity

Provide an alternative way to design relations using semantics rather than concepts

Take an existing “flat file” design and creating a relational design from it through the process of Normalization

Identify sources of problems (or anomalies) within a given relational design

Argue about improvements to designs created by others

ISOM

Based on mathematical set theory

Tuples

Attributes

Attribute

Values

A tuple is a set of attribute-value properties (relations)

Ordering of attributes is immaterial

Ordering of Tuples is immaterial

Tuples are distinct from one another

Attributes contain atomic values only

Sheet1

Emp#

Name

Address

E1

ISOM

Attributes

Attribute domain

Domain (GPA) =

Domain (name) =

Domain (DateOfBirth) =

Domain (year)

ISOM

Tuples

Cardinality: Number of tuples in a relation

What is the difference between the cardinality and the degree?

Sheet1

ID

Name

Age

Address

GPA

S1

Jose

21

Primary Keys

Superkey: SK, a subset of attributes of R, satisfying Uniqueness, that is, no two tuples have the same combination of values for these attributes

Candidate Key: K, a superkey SK, satisfying minimality, that is, no component of K can be eliminated without destroying the uniqueness property.

Primary Key: PK, the selected Candidate key, K.

Can a primary key be composed of multiple attributes?

Can a relation have multiple primary keys?

ISOM

ISOM

Entity Integrity Rule

The primary key of a base relation cannot contain a NULL value.

Enforcement of the rule:

An update which results in a NULL value in the primary key must be rejected.

Are the following ok?

Club (ID, Name, …) Player (ID, Name, ?*, …)

Order (OrdID, Date, …, ?*) Customer (ID, Name, …, ?*)

Dept (DeptID, Name, …, ?*) Employee (EID, Name, …, ?*)

Attribute(s) of one relation that reference(s) the PK of another relation

FK may or may not be (a part of) the PK of this relation

Course (CourseID, Name, …, ?*) Class (ClassID, Meets, …, ?*)

Student (SID, Name, …, ?*) Registration (?)

Can an FK refer to a part of the PK of another relation?

Can an FK refer to a PK of the same relation?

ISOM

FK and referenced PK may have different names

The values of FK must draw from the value set of PK

How do we define the Domain of an FK?

Can an FK have a NULL value?

What can we enforce with PKs and FKs?

Domain

Referential Integrity Rule

If FK is the foreign key of a relation R2, which matches the primary key PK of the relation R1, then:

the FK value must match the PK value in some tuple of R1, or

the FK value may be NULL, but only if the FK is not (a part of) the PK of R2.

Enforcement of the Rule

An update on either a referenced PK or an FK must satisfy the rule. Otherwise, the operation is rejected.

Which operation on the primary key may violate this rule?

Which operation on the foreign key may violate this rule?

ISOM

Restrict

Cascade

try to propagate the operation to all dependent FK values, if it is not possible, reject the operation

Nullify (or Default)

set all dependent FK values to NULL (or a default value), if that is not possible, reject the operation

Cases for each of the above situations?

ISOM

Name char(30) not null,

coursename char(30) not null,

Can we insert a new course# with a new textbook?

What should be done if ‘CIS’ is changed to ‘MIS’?

What would happen if we remove all CIS 800 students?

Sheet1

SID

Name

Grade

Course#

Text

Major

Dept

s1

Joseph

A

CIS800

b1

CIS

CIS

s1

Joseph

B

CIS820

b2

CIS

CIS

s1

Joseph

A

CIS872

b5

CIS

CIS

s2

Alice

A

CIS800

b1

CS

MCS

s2

Alice

A

CIS872

b5

CS

MCS

s3

Tom

B

CIS800

b1

Acct

Acct

s3

Tom

B

CIS872

b5

Acct

Acct

s3

Tom

A

CIS860

b1

Acct

Acct

ISOM

Poor Relation Design causes Anomalies

Insertion anomalies - Insertion of some piece of information cannot be performed unless other irrelevant information is added to it.

Update anomalies - Update of a single piece of information requires updates to multiple tuples.

Deletion anomalies - Deletion of a piece of information removes other unrelated but necessary information.

Normalization improves the design to remove these anomalies

ISOM

contain minimum amount of redundancy

allow users to insert, delete and modify tuples in the relation without errors or inconsistencies.

improve quality of information in the database

decrease storage space for the database

Costs

may require more storage in some cases

ISOM

Do you see any problems in the definition?

Do you see any anomalies in the data?

Sheet1

STUDENT

STUDENT

COURSE

COURSE

INSTR

ROOM

CREDITS

GRADE

ID

NAME

ID

NAME

NAME

224

Waters

CIS20

Functional Dependency (FD)

Consider two attributes, X and Y, and two arbitrary tuples r1 and r2 of a relation R.

Y is functionally dependent on X iff:

value of x in r1 = value of x in r2

implies

value of Y in r1 = value of Y in r2

Also stated as: R.X R.Y or X Y

ISOM

X is called the determinant of Y.

X may or may not be the key attribute of R.

A FD changes with its semantic meaning

Name Address?

X and Y may be mutually dependent on each other

Husband Wife, Wife Husband

Course# Text

When is X Y a FFD?

When Y is not functionally dependent on any proper subset of X

X Y is a fully functional dependency ( FFD )

( SID, Course# ) Name? ( SID, Course# ) Grade?

( SID, Name ) Major? ( SID, Name ) SID?

By default, the term FD refers to FFD

ISOM

Given attributes X, Y, and Z of a relation R,

Z is transitively dependent on X (X Z)

iff X Y and Y Z

For example:

ISOM

Some Inference Rules for FDs

An FD is redundant if it can be derived from other FDs based on a set of inference rules. Some of these rules are:

Reflexive rule: If X Y, then X Y

X always determines a subset of itself.

Augmentation rule: If X Y, then XZ YZ

Adding an attribute(s) on both side does not change the FD.

Transitive rule: If X Y & Y Z, then X Z

Functional dependencies can be ‘chained’.

Decomposition rule: If X YZ, then X Y and X Z

Given: { SID Name, SID Major, Major Dept }, which ones is/are redundant?

SID School, SID Dept, Dept School

SID ( Name, Major ), (SID, Name) (Major, Name)

SID SID, SID (Name, SID)

ISOM

DEFINITION

A relation R is in first normal form (1NF) if and only if all underlying domains contain atomic values only.

Translation

To be in first normal form the table must not contain any repeating attributes.

Implication

ISOM

What are the PKs?

Insertion Anomaly

A new course cannot be inserted in the database (relation Student-Course) until a student registers for that course.

Update Anomaly

If the instructor of a course is changed, this fact would have to be noted at many places in the database (many tuples of the relation Student-Course).

Deletion Anomaly

Withdrawal of all students from an existing course (that is, deletion of related tuples from the relation Student-Course) will result in unwarranted removal of that course from the database.

ISOM

1NF Relations have anomalies

DEFINITION

A relation R is in second normal form (2NF) if and only if it is in 1NF and every nonkey attribute is dependent on the full primary key.

Translation

A table is in second normal form if there are no partial dependencies.

Implication

What kinds of primary keys may lead to a violation of the Second Normal Form (2NF) ?

ISOM

Insertion anomaly

Information about a faculty (potential advisor) cannot be added to the database unless a student is assigned to him/her.

Update anomaly

If the advisor’s office location or phone were changed, many tuples would need to be changed.

Deletion anomaly

If all students assigned to an advisor graduate, information about the advisor will disappear from the database.

Sheet1

STUDENT

STUDENT

STATUS

ADVISOR

ADVISOR

ADVISOR

TOTAL

ID

NAME

OFFICE

PHONE

CREDITS

224

Waters

Junior

Young

CBA221

726104

105

351

Byron

Soph

Greene

CBA215

718434

77

421

Smith

Junior

Young

CBA221

726104

97

ISOM

DEFINITION

A relation R is in third normal form (3NF) if and only if it is in 2NF and every nonkey attribute is non-transitively dependent on the primary key.

Translation

A table is in Third Normal Form if every non-key attribute is determined by the key, and nothing else.

Implication

How many total attributes must the relation have for a possible violation of the Third Normal Form (3NF) ?

ISOM

StudentName

Status

TotalCredits

AdvisorOffice

AdvisorPhone

StudentId

Advisor

Advisor

ISOM

R has multiple candidate keys,

Those candidate keys are composite, and

The candidate keys are overlapped.

Computer-Lab (SID, Account, Class, Hours)

A relation R is in BCNF iff every determinant is a candidate key.

ISOM

Flatten the Table Completely (no composite columns)

Find the Key and “all” FDs (well as many as you can possibly detect)

Find Partial Dependencies and decompose relation using them (2NF)

Find Transitive dependencies and decompose using them (3NF)

Remember – this is not a deterministic method – depends on the order in which FDs are chosen, so same Relation, same set of FDs can lead to different decompositions!

ISOM

In a good decomposition

Decomposed relations can be maintained independently

Rissanen’s rule for non-loss decomposition: Two projections R1 and R2 of a relation R are independent iff:

Every FD in R can be logically deduced from those in R 1 and R 2 , and

The common attributes of R 1 and R 2 form a candidate key for at least one of the pair.

ISOM

ISOM

CourseSectionMeetsEnrolled

2011MW20

201NULLTTh25

NULLNULLMWF18

Welcome message from author

This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Related Documents