Top Banner
Prof.Ajinkya Raut VJTI,Matunga,Mumbai-19 Normalization
34
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Database normalization

Prof.Ajinkya RautVJTI,Matunga,Mumbai-19

Normalization

Page 2: Database normalization

Database Normalization

• Database normalization is the process of removing redundant data from your tables in to improve storage efficiency, data integrity, and scalability.

• In the relational model, methods exist for quantifying how efficient a database is. These classifications are called normal forms (or NF), and there are algorithms for converting a given database between them.

• Normalization generally involves splitting existing tables into multiple ones, which must be re-joined or linked each time a query is issued.

Page 3: Database normalization

History

• Edgar F. Codd first proposed the process of normalization and what came to be known as the 1st normal form in his paper A Relational Model of Data for Large Shared Data Banks Codd stated:

“There is, in fact, a very simple elimination procedure which we shall call normalization. Through decomposition nonsimple domains are replaced by ‘domains whose elements are atomic (nondecomposable) values.’”

Page 4: Database normalization

Normal Form

• Edgar F. Codd originally established three normal forms: 1NF, 2NF and 3NF. There are now others that are generally accepted, but 3NF is widely considered to be sufficient for most applications. Most tables when reaching 3NF are also in BCNF (Boyce-Codd Normal Form).

Page 5: Database normalization

Table 1

Title Author1 Author2

ISBN Subject Pages Publisher

Database System Concepts

Abraham Silberschatz

Henry F. Korth

0072958863 MySQL, Computers

1168 McGraw-Hill

Operating System Concepts

Abraham Silberschatz

Henry F. Korth

0471694665 Computers 944 McGraw-Hill

Page 6: Database normalization

Table 1 problems

• This table is not very efficient with storage. • This design does not protect data integrity.

• Third, this table does not scale well.

Page 7: Database normalization

• A repeating group means that a table contains two or more columns that are closely related.

• For example, a table that records data on a book and its author(s) with the following columns: [Book ID], [Author 1], [Author 2], [Author 3] is not in 1NF because [Author 1], [Author 2], and [Author 3] are all repeating the same attribute.

Page 8: Database normalization

Title Author ISBN Subject Pages Publisher

Database System Concepts

Abraham Silberchartz

0072958863 MySQL computers

1168 McGraw Hill

Database System Concepts

Henry F.Korth

0072958863 MySQL computers

1168 McGraw Hill

Operating System Concepts

Abraham Silberchartz

0471694665 Computers 944 McGraw Hill

Operating System Concepts

Henry F.Korth

0471694665 Computers 944 McGraw Hill

Page 9: Database normalization

A database is in first normal form if it satisfies the following conditions:Contains only atomic valuesThere are no repeating groups

An atomic value is a value that cannot be divided. For example, in the table shown below, the values in the [Color] column in the first row can be divided into "red" and "green", hence [TABLE_PRODUCT] is not in 1NF.

FIRST NORMAL FORM (1NF)

Page 10: Database normalization

1st Normal Form ExampleHow do we bring an unnormalized table into first normal form? Consider the following example:

This table is not in first normal form because the [Color] column can contain multiple values. For example, the first row includes values "red" and "green."

Page 11: Database normalization

To bring this table to first normal form, we split the table into two tables and now we have the resulting tables:

Page 12: Database normalization

SECOND NORMAL FORM (2NF)A database is in second normal form if it satisfies the following conditions:It is in first normal formAll non-key attributes are fully functional dependent on the

primary key

In a table, if attribute B is functionally dependent on A, but is not functionally dependent on a proper subset of A, then B is considered fully functional dependent on A. Hence, in a 2NF table, all non-key attributes cannot be dependent on a subset of the primary key. Note that if the primary key is not a composite key, all non-key attributes are always fully functional dependent on the primary key. A table that is in 1st normal form and contains only a single key as the primary key is automatically in 2nd normal form.

Page 13: Database normalization

Consider the following example:

This table has a composite primary key [Customer ID, Store ID]. The non-key attribute is [Purchase Location]. In this case, [Purchase Location] only depends on [Store ID], which is only part of the primary key. Therefore, this table does not satisfy second normal form.

Page 14: Database normalization

To bring this table to second normal form, we break the table into two tables, and now we have the following:

Now, in the table [TABLE_STORE], the column [Purchase Location] is fully dependent on the primary key of that table, which is [Store ID].

Page 15: Database normalization

A database is in third normal form if it satisfies the following conditions:It is in second normal formThere is no transitive functional dependency

By transitive functional dependency, we mean we have the following relationships in the table: A is functionally dependent on B, and B is functionally dependent on C. In this case, C is transitively dependent on A via B.A→B,B→C then A→C

3RD NORMAL FORM DEFINITION

Page 16: Database normalization

Consider the following example:

In the table able, [Book ID] determines [Genre ID], and [Genre ID] determines [Genre Type]. Therefore, [Book ID] determines [Genre Type] via [Genre ID] and we have transitive functional dependency, and this structure does not satisfy third normal form.

Page 17: Database normalization

To bring this table to third normal form, we split the table into two as follows:

Now all non-key attributes are fully functional dependent only on the primary key. In [TABLE_BOOK], both [Genre ID] and [Price] are only dependent on [Book ID]. In [TABLE_GENRE], [Genre Type] is only dependent on [Genre ID].

Page 18: Database normalization

Fourth Normal Form (4NF)

Multi-valued dependency (MVD) represents a dependency between attributes (for example, A, B and C) in a relation, such that for each value of A there is a set of values for B and a set of value for C. However, the set of values for B and C are independent of each other.

A multi-valued dependency can be further defined as being trivial or nontrivial. A MVD A > B in relation R is defined as being trivial if

B is a subset of A or

A U B = R A MVD is defined as being nontrivial if neither of the above twoconditions is satisfied.

Page 19: Database normalization
Page 20: Database normalization

Fifth normal formFifth normal form (5NF), also known as project-join normal

form (PJ/NF) is a level of database normalization designed to reduce redundancy in relational databases recording multi-valued facts by isolating semantically related multiple relationships.

A table is said to be in the 5NF if and only if every join dependency in it is implied by the candidate keys.

A join dependency *{A, B, … Z} on R is implied by the candidate key(s) of R if and only if each of A, B, …, Z is a superkey for R.

Page 21: Database normalization

EXAMPLE:-

Page 22: Database normalization

The table's predicate is: Products of the type designated by Product Type, made by the brand designated by Brand, are available from the traveling salesman designated by Traveling Salesman.

A Traveling Salesman has certain Brands and certain Product Types in his repertoire. If Brand B1 and Brand B2 are in his repertoire, and Product Type P is in his repertoire, then (assuming Brand B1 and Brand B2 both make Product Type P), the Traveling Salesman must offer products of Product Type P those made by Brand B1 and those made by Brand B2.

In that case, it is possible to split the table into three:

Page 23: Database normalization
Page 24: Database normalization

Boyce-Codd Normal Form (BCNF)

When a relation has more than one candidate key, anomalies may result even though the relation is in 3NF.

3NF does not deal satisfactorily with the case of a relation with overlapping candidate keys

i.e. composite candidate keys with at least one attribute in common.BCNF is based on the concept of a determinant.A determinant is any attribute (simple or composite) on

which some other attribute is fully functionally dependent.A relation is in BCNF is, and only if, every determinant is

a candidate key.

Page 25: Database normalization

Consider the following relation and determinants. R(a,b,c,d) a,c -> b,d a,d -> b

Here, the first determinant suggests that the primary key of R could be changed from a,b to a,c. If this change was done all of the non-key attributes present in R could still be determined, and therefore this change is legal. However, the second determinant indicates that a,d determines b, but a,d could not be the key of R as a,d does not determine all of the non key attributes of R (it does not determine c). We would say that the first determinate is a candidate key, but the second determinant is not a candidate key, and thus this relation is not in BCNF (but is in 3rd normal form).

Page 26: Database normalization
Page 27: Database normalization

DB(Patno,PatName,appNo,time,doctor)Patno -> PatNamePatno,appNo -> Time,doctorTime -> appNoNow we have to decide what the primary key of DB is going to be. From the information we have, we could chose: DB(Patno,PatName,appNo,time,doctor)or DB(Patno,PatName,appNo,time,doctor)

Page 28: Database normalization

Example 2:

Page 29: Database normalization
Page 30: Database normalization
Page 31: Database normalization

Problem 1) A college maintains details of its lecturers' subject area skills. These details comprise:Lecturer NumberLecturer NameLecturer GradeDepartment CodeDepartment NameSubject CodeSubject NameSubject LevelAssume that each lecturer may teach many subjects but may not belong to more than one department.Subject Code, Subject Name and Subject Level are repeating fields.Normalise this data to Third Normal Form?

Page 32: Database normalization

UNFLecturer Number ,Lecturer Name, Lecturer Grade, Department Code,Department Name, Subject Code, Subject Name, Subject Level1NFLecturer Number, Lecturer Name, Lecturer Grade, Department Code, Department NameLecturer Number , Subject Code, Subject Name,Subject Level2NFLecturer Number, Lecturer Name, Lecturer Grade, Department Code, Department NameLecturer Number, Subject CodeSubject Code, Subject Name, Subject Level3NFLecturer Number,Lecturer Name,Lecturer Grade*Department CodeDepartment Code, Department NameLecturer Number, Subject CodeSubject Code,Subject Name,Subject Level

Page 33: Database normalization

Problem2)A software contract and consultancy firm maintains details of all the various projects in which its employees are currently involved. These details comprise:Employee NumberEmployee NameDate of BirthDepartment CodeDepartment NameProject CodeProject DescriptionProject SupervisorAssume the following:Each employee number is unique.Each department has a single department code.Each project has a single code and supervisor.Each employee may work on one or more projects.Employee names need not necessarily be unique.Project Code, Project Description and Project Supervisor are repeating fields.Normalise this data to Third Normal Form.

Page 34: Database normalization

UNFEmployee Number, Employee Name,Date of Birth, Department Code, Department Name, Project Code, Project Description, Project Supervisor1NFEmployee Number, Employee Name, Date of BirthDepartment Code, Department NameEmployee Number, Project Code, Project Description, Project Supervisor2NFEmployee Number, Employee Name, Date of Birth, Department Code, Department NameEmployee Number,Project Code,Project Code, Project Description,Project Supervisor3NFEmployee Number, Employee Name, Date of Birth, *Department CodeDepartment Code, Department NameEmployee Number, Project CodeProject Code, Project Description, Project Supervisor