Normalization is a formal method involved with a series of test to help database designer to be able to identify the optimal grouping of attributes for each relation in the relational schema. Normalization can be applied to individual relation so that database can be normalized to a specific form to prevent the possible occurrence of update anomaly. Normalization
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Normalization is a formal method involved with a series of test to help database designer to be able to identify the optimal grouping of attributes for each relation in the relational schema. Normalization can be applied to individual relation so that database can be normalized to a specific form to prevent the possible occurrence of update anomaly.
Normalization
Data Redundancy and Update Anomalies
The main purpose of database design is to identify the optimal grouping of attributes in order to minimize data redundancy which affect on saving space for data storage.
Data redundancy always causes UPDATE ANOMALIES which are classified into 3 types:
Insertion anomalies
Deletion Anomalies
Modification Anomalies
Insertion Anomalies
Deletion Anomalies
Modification Anomalies
Insertion Anomalies
To insert the details of new students into the Class_Info relation, we must include the details of the lecturer and subject in order to avoid null value.
Deletion Anomalies
If we delete a lecturer from the Class_Info relation, the detailsof students and subjects are also lost from the database.
Modification Anomalies
If we want to change the value of one of the attributes of a particular student in the Class_Info relation, we must update all rows which associate to the student. If this modification is not carried out on all the appropriate rows of the Class_Info relation, the database will become inconsistent.
Class_Info
LID Lname Salary Dept Subject Credit SID Sname GPA
E5001 Dusit 28700 EE Electronic 1 3 S4 Panita 3.35
E9001 Pattara 18500 CPE Data Structure 3 S1 Preeda 2.85
E9001 Pattara 18500 CPE Data Structure 3 S2 Panu 2.45
E9001 Pattara 18500 CPE Data Structure 3 S3 Vallapa 3.02
E9001 Pattara 18500 CPE Web Service 4 S3 Vallapa 3.02
E9001 Pattara 18500 CPE Web Services 4 S1 Preeda 2.85
E9001 Pattara 18500 CPE Web Services 4 S2 Panu 2.45
Dusit 45000
Dusit 45000
Dusit 45000
Dusit 45000
Modification Anomaly
If we want to change the value of one of the attributes of a particular entity in the relation, we must update all rows that relate to this entity. If this modification is not carried out on all the appropriate rows ,the data base will become inconsistent.
Class_Info
Pattara 25000
Panu 2.67
Panu 2.45
Pattara 18500
Pattara 18500
Pattara 18500
Pattara 18500
Pattara 18500
Pattara 21000
To solve update anomalies, a relation must be normalized by using normalization process to remove existing data redundancy.
LID Lname Salary Dept Subject Credit SID Sname GPA
E5001 Dusit 28700 EE Electronic 1 3 S4 Panita 3.35
E9001 Pattara 18500 CPE Data Structure 3 S1 Preeda 2.85
E9001 Pattara 18500 CPE Data Structure 3 S2 Panu 2.45
E9001 Pattara 18500 CPE Data Structure 3 S3 Vallapa 3.02
E9001 Pattara 18500 CPE Web Service 4 S3 Vallapa 3.02
E9001 Pattara 18500 CPE Web Services 4 S1 Preeda 2.85
E9001 Pattara 18500 CPE Web Services 4 S2 Panu 2.45
Functional Dependency
One of the main concepts associated with normalization is functional dependency, which describes the relationship between attributes.
Functional Dependency describes the relationship between attributes in a relation. For example, if A and B are attributes (or set of attributes) of relation R, B is functionally dependent on A (denoted AB), if each value of A is associated with exactly one value of B.
The symbol of Functional Dependency (AB can be described as followings: B is functionally dependent on A or A determines B or B depends on A
Functional Dependencies
One of the main concepts associated with normalization is functional dependency, which describes the relationship between attributes.
(Definition of Functional Dependency)
Suppose that B is an attribute and A is another one, we
said that B is functionally dependent on A (denoted A B), if each value of A is associated with exactly one value of B. ( A and B may each consists of one or more attributes.)
The symbol of functional dependence (A B) means
B is functionally dependent on A or A functionally defines B or B depends on A
If the functional dependency holds on schema R, in any legal relation r, for all pairs of tuples t1 and t2 in r such that t1[] = t2[], it is also the case that t1[] = t2[].
Given a relation r, attribute y of r is dependent on attribute x if and only if whenever two tuples of R agree on their x-value,they must necessarily agree on their y-value.
For every tuple in the relation r, if the value of attribute in tuples are the same, DBMS guarantees that the value of the attribute in those tuples must be the same. That is
If holds on R and if t1[] = t2[] DBMS must guarantee that t1[] = t2[]
A BB is functionallydependent on A
When a functional dependency exists, the attribute or groupOf attributes on the left-hand side of the arrow is called the determinant.
Staff_No PositionPosition is functionally
dependent on Staff_No
SL21 System Engineer
Position Staff_NoStaff_No is not functionallydependent on Position
System Engineer SL21
SG5
LID Lname Salary Dept Subject Credit SID Sname GPA
E5001 Dusit 28700 EE Electronic 1 3 S4 Panita 3.35
Normalization is a formal method involved with a series of test to help database designer to be able to identify the optimal grouping of attributes for each relation in the relational schema.
Unnormalized Form
1st Normal Form
2nd Normal Form
3rd Normal Form
Boyce-Codd Normal Form
Normalization can be applied to individual relation so that database can be normalized to a specific form to prevent the possible occurrence of update anomaly.
The process of normalization is a formalmethod that identifies relations based onprimary key (or candidate keys in the case of BCNF the functional dependencies among their attributes).
Relationships of Normal Forms1NF
2NF
3NF/BCNF
4NF
5NF
DKNFHigherNormalforms
Case StudyThe DreamHome company manages property on behalf of the owners, and as part of this service, the company takes care of the property’s rental. To simplify this example, we assume that a customer rents a given property only once, and cannot rent more than one property at any one time.
Unnormalized form (UNF) : A table that contains one or more repeating groups.
A repeating group is an attribute or group of attributes within a table that occurs with multiple values for a single occurrence of the key attribute (s) for that table. The term key refers to the attribute (s)that uniquely identify each row within the unnormalized table.
Case StudyThe DreamHome company manages property on behalf of the owners, and as part of this service, the company takes care of the property’s rental. To simplify this example, we assume that a customer rents a given property only once, and cannot rent more than one property at any one time.
Adjust Unnormalized form to 1st NF by removing of repeating groups in order to form relational data model (data are conceptually structured in the form of table) .
First normal form (1NF) : A relation in which the intersection of each row and column contains one and only one value.
For the relational data model, it is important to recognize that it is only first normal form(1NF) that is critical in creating appropriate relations. All the subsequent normal forms are optional. However, to avoid the update anomalies, it is recommended that we proceed to at least 3NF.
A relation that is in the first normal form and every non-primary key attribute is fully functionally dependent on the primary key.
Second Normal Form (2NF) :
Full functional : Indicates that if A and B are attributes of a relation, B is fully functionally dependentdependency on A if B is functionally dependent on A, but not on any proper subset of A.
ถ้�า B เป็�น Non-Key attribute ซึ่งมี ฟั�งก์�ชั่� นก์ารขึ้�นต่�อก์�นอยู่��ก์�บส่�วนใดส่�วนหนง ขึ้องคี ยู่�หลั�ก์ เราจะเร ยู่ก์ว�า B
partial dependence on A. Partial dependency ต่�องถ้�ก์ขึ้จ�ดออก์โดยู่ ก์ารแยู่ก์ ออก์ไป็ต่��งเป็�นต่ารางใหมี� เพื่*อให�
2NF applies to relations with composite keys, that is, relations with a primary key that composed of two or more attributes. A relation with a single attribute primary key is automatically in at least 2NF.
Transitive dependency : A condition where A, B, and C are attributes of a relation such that if A B and B C, then C is transitively dependent on A via B (provided that A is not functionally dependent on B or C).
Definition of Third Normal Form:
A relation that is in first and second normal form, and in which no non-primary key attributeis transitively dependent on the primary key.
The difference between 3NF and BCNF is that for a functional dependency AB, 3NF allows this dependency in a relation if B is a primary-key attribute and A is not a candidate key. Whereas, BCNF insists that for this dependency to remain in a relation, A must be a candidate key. Therefore, BCNF is a stronger form of 3NF, such every relation in BCNF is also in 3NF.
BCNF is based on functional dependencies that take into account all candidate keys in a relation. For a relation with only one candidate key, 3NF and BCNF are equivalent.
Boyce-Codd : A relation is in BCNF if and only if every determinant is normal form (BCNF) a candidate key.
Violation of BCNF is quite rare, since it may only happen under specific conditions. The potential to violate BCNF may occur in relation that
• contains two (or more) composite candidate keys and
• which overlap, that is share at least one attribute in common
Case Study
In this example, Client_Interview relation is presented. It contains details of the arrangements for interviews of clients by members of staff of the DreamHome company. The members of staff involved in interviewing clients are allocated to a specific room on the day of interview. However, a room may be allocated to several members of staff as required throughout a working day. A client is only interviewed once on a given date, but may be requested to attend further interviews at later dates. This relation has three candidate keys:
(Client_No, Interview_Date), (Staff_No, Interview_Date, Interview_Time), and (Room_No, Interview_Date, Interview_Time).
Therefore the Client_Interview relation has three composite candidate keys, which overlap by sharing the common attribute Interview_Date. We select Client_No, Interview_Date) to act as the primary key for this relation.
The DreamHome company manages property on behalf of the owners, and as part of this service the company undertakes regular inspections of the property by members of staff. When staff are required to undertake these inspections, they are allocated a company car for use on the day of the inspections. However, a car may be allocated to several members of staff, as required throughout the working day. A member of staff may inspect several properties on a given date, but a property is only inspected once on a given date.
Although BCNF removes any anomalies due to functional dependencies, further research led to the identification of another type of dependency called multi-valued dependency (MVD), which can cause similar design problems for relations in terms of data redundancy.
Lecturer_Name Subject Research
Yuen Data Structure Natural Language Processing
Yuen Data Structure Protocal Analyzer
Yuen Discrete Math Natural Language Processing
Yuen Discrete Math Protocal Analyzer
Yuen Data Base Natural Language Processing
Yuen Data Base Protocal Analyzer
Chalerrmsak Data Structure Protocal Analyzer
Chalerrmsak Data Structure Compiler Utilities
Chalerrmsak Data Structure Natural Language Processing
Multi-valued : Represents a dependency between attributes (for example, A,dependency B, and C) in a relation, such that for each value of A there is a (MVD) set of values for B, and a set of values for C. However, the set
of values for B and C are independent of each other.
A > BA > C
Lecturer > SubjectLecturer > Research
Lecturer_Name Subject Research
Yuen Data Structure Natural Language Processing
Yuen Data Structure Protocal Analyzer
Yuen Discrete Math Natural Language Processing
Yuen Discrete Math Protocal Analyzer
Yuen Data Base Natural Language Processing
Yuen Data Base Protocal Analyzer
Chalerrmsak Data Structure Protocal Analyzer
Chalerrmsak Data Structure Compiler Utilities
Chalerrmsak Data Structure Natural Language Processing
Lec_Sub_Research Relation
Lecturer_Name Subject
Yuen Data Structure
Yuen Discrete Math
Yuen Data Base
Chalerrmsak Data Structure
Lec_Sub Relation
Lecturer_Name Research
Yuen Natural Language Processing
Yuen Protocal Analyzer
Chalerrmsak Protocal Analyzer
Chalerrmsak Compiler Utilities
Chalerrmsak Natural Language Processing
Lec_Research Relation
Unnormalized form (UNF)
First normal form (1NF)
Second normal form (2NF)
Third normal form (3NF)
Boyce-Codd form (BCNF)
Fourth normal form (4NF)
Remove repeating groups
Remove partial dependencies
Remove transitive dependencies
Remove remaining anomalies From functional dependencies
Remove multi-valued dependencies
LID Lname Salary Dept Subject Credit SID Sname GPA
E5001 Dusit 28700 EE Electronic 1 3 S4 Panita 3.35