NORMALIZATION Prof. Sridhar Vaithianathan
Jun 15, 2015
NORMALIZATION
Prof. Sridhar Vaithianathan
Entities, Attributes and RelationshipStrong Entity Vs Weak entity ( EMPLOYEE &
DEPENDENT)Simple Vs Composite AttributesSingle Valued Vs Multi Valued AttributesStored Vs Derived AttributesIdentifier Attribute – Primary KeyComposite IdentifierForeign KeySub-Type Vs Super Type Relationship
Properties of Relations
1. Each relation (or table) in a database has a unique name.
2. An entry at the intersection of each row and column is atomic (single valued).there can be no multivalued attributes in a relation.
3. Each row (record) is unique; no two rows in a relation are identical.
4. Each attribute(or column) within a table has a unique name.
5. The sequence of columns/rows (left to right/top to bottom) is insignificant.
Integrity Constraints Domain Constraints: All of the values that appear in a column of
a relation must be taken from the same domain. – A domain is the set of values that may be assigned to an attribute. [Domain
definition usually consists of: domain name, meaning, data type, size (length), and allowable values/range.]
Entity Integrity Constraint: No primary key attribute (or component of primary key attribute) may be null.– Null: A value that may be assigned to an attribute when no other value
applies or when the applicable value is unknown. – Null is neither numeric zero nor string of blanks. – In reality null is not a value but rather absence of a value
Referential Integrity Constraint: Either each foreign key value must match a primary key value in another relation or the foreign key value must be null. ( Eg : Student who has not been assigned any faculty as mentor)
Logical Database Design1. Top-down approach > E-R modeling
2. Bottom-up approach > Normalization.
Databases : Relational Vs Non-Relational.
What is Normalization?
It is a formal process for deciding which attributes should be grouped together in relation
It is a step by step decomposition of complex records into simple records and thereby reducing redundancy
Why Normalize ?
Normalization reduces redundancy. Redundancy is the unnecessary repetition of data
Redundancy can lead to:
1. Inconsistencies – Errors are more likely to occur when facts are repeated
2. Update Anomalies
- Inserting, modifying and deleting data may cause inconsistencies
- High likelihood of updating or deleting data in one table while omitting to make corresponding changes in other relations
A fully normalized record consists of:
3. A primary key that identifies an entity
4. A set of attributes that describe the entity
Normal forms (NF) are table structures with minimum redundancy
Functional Dependency
Normalization theory is based on the fundamental notion of functional dependency.
Given a relation R, attribute B is functionally dependent on A if , for every valid instance of A, that value of A uniquely determines the value of B.
The functional dependency of B on A is represented as below
A B
Example: Suppose entity CUSTOMER has the following attributes
Cust_Code, Name, Address and Phone_Number.
Cust_Code Name, Address, Phone_Number
Cust_Code Name Address Phone_Number
Boyce -Codd NF, 4 NF and 5NF
1 NF
2 NF
3 NF
Unnormalized Relation
Steps in Normalization
Steps in Normalization1. 1NF: A relation is in 1NF if multi-valued attributes (also called
repeating groups) have been removed, so there is a single value (possibly null) at the intersection of each row and column of the table.
2. 2NF: A relation is in 2NF if it is in 1NF, and contains no partial dependencies.
A partial functional dependency in a relation is a functional dependency in which one or more nonkey attributes are functionally dependent on part (but not all) of the primary key.
3. 3NF:A relation is in 3NF if it is in 2NF and no transitive dependencies exist.
A transitive dependency in a relation is a functional dependency between two (or more) nonkey attributes.
Pine Valley Furniture Company Database
Invoice Data - Pine Valley Furniture Company
1 NF: A relation is in 1NF if multi-valued attributes (also called repeating groups) have been removed, so there is a single value (possibly null) at the intersection of each row and column of the table.
Functional Dependency Diagram for Invoice
A partial functional dependency in a relation is a functional dependency in which one or more nonkey attributes are functionally dependent on part (but not all) of the primary key.
Removing Partial Dependencies
2NF: A relation is in 2NF if it is in 1NF, and contains no partial dependencies.
A transitive dependency in a relation is a functional dependency between two (or more) nonkey attributes.
Removing Transitive Dependencies
3NF:A relation is in 3NF if it is in 2NF and no transitive dependencies exist.
Note to Students: For drawing ER diagram of your project , Try MS Visio, an easy to use tool to draw the ER Diagram as one shown above
Relational Scheme for INVOICE data (MS Visio)
SQL – Structured Query Language
SQL Statements
SELECT (select list)
FROM (table List)
WHERE (condition for
retrieval)
ORDER BY (sort criteria)
Example:
SELECT Empno, Ename,
Job, Sal
FROM EMP
WHERE Sal > 2500
ORDER BY Job, Ename
Table : EMP
Empno Ename Job Sal
8756 Dravid President 80005348 Raju Manager 5000
SQL – Structured Query Language
SQL Statements
SELECT (select list)
FROM (table List)
WHERE (condition for
retrieval)
ORDER BY (sort criteria)
Example:
SELECT Order Number, Unit Price *Quantity AS Total
FROM Order
Normalization - Recap
1. 1NF: A relation is in 1NF if multi-valued attributes (also called repeating groups) have been removed, so there is a single value (possibly null) at the intersection of each row and column of the table.
2. 2NF: A relation is in 2NF if it is in 1NF, and contains no partial dependencies.
A partial functional dependency in a relation is a functional dependency in which one or more nonkey attributes are functionally dependent on part (but not all) of the primary key.
3. 3NF:A relation is in 3NF if it is in 2NF and no transitive dependencies exist.
A transitive dependency in a relation is a functional dependency between two (or more) nonkey attributes.
SUMMARY - Normalization – Rules – 1NF TO 3NF
First Normal Form (1NF) First normal form (1NF) sets
the very basic rules for an organized database:
Eliminate duplicative columns from the same table.
Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).
Second Normal Form (2NF) Second normal form (2NF)
further addresses the concept of removing duplicative data:
Meet all the requirements of the first normal form.
Remove subsets of data that apply to multiple rows of a table and place them in separate tables.
Create relationships between these new tables and their predecessors through the use of foreign keys.
Third Normal Form (3NF)• Third normal form (3NF) goes one large step further: • Meet all the requirements of the second normal form. • Remove columns that are not dependent upon the primary key.
SUMMARY - Normalization – Rules – 1NF TO 3NF
1NF Eliminate Repeating Groups - Make a separate table for
each set of related attributes, and give each table a primary key.
2NF Eliminate Redundant Data - If an attribute depends on only
part of a multi-valued key, remove it to a separate table.
3NF Eliminate Columns Not Dependent On Key - If attributes
do not contribute to a description of the key, remove them to a separate table.
SUMMARY - Normalization – Rules – 1NF TO 3NF
Normalization - Exercises
Normalize
Normalize
Exercise 1 Emp _No Prof_Designation Emp_Name Dept_Code Dept_Name Prof_Office Student_Name Student_Id Student DOB Student Age
Exercise 2 Prod NoProd DescItem NoSalesperson NameCustomer NameQuantityPrice
NormalizeEmp NoEmp NameDept NoDept NameMgr NoProj NoProj NameStart DateBilling Rate
Normalize
Title Author1 Author2
ISBN Subject Pages Publisher
Database System Concepts
Abraham Silberschatz
Henry F. Korth
0072958863 MySQL, Computers
1168 McGraw-Hill
Operating System Concepts
Abraham Silberschatz
Henry F. Korth
0471694665 Computers 944 McGraw-Hill