Top Banner
MSc IT UFIE8K-15-M Data Management Prakash Chatterjee http://www.cems.uwe.ac.uk/~p-chatterjee/ Department of Computer Science and Creative Technologies University of the West of England Lecture 6 : Normalisation
25

MSc IT UFIE8K-15-M Data Management Prakash Chatterjee p-chatterjee/ Department of Computer Science and Creative Technologies.

Mar 29, 2015

Download

Documents

Damion Toppin
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: MSc IT UFIE8K-15-M Data Management Prakash Chatterjee p-chatterjee/ Department of Computer Science and Creative Technologies.

MSc IT UFIE8K-15-M Data ManagementPrakash Chatterjeehttp://www.cems.uwe.ac.uk/~p-chatterjee/Department of Computer Science and Creative TechnologiesUniversity of the West of England

Lecture 6 : Normalisation

Page 2: MSc IT UFIE8K-15-M Data Management Prakash Chatterjee p-chatterjee/ Department of Computer Science and Creative Technologies.

UFIE8K-15-M Data Management 2008

2

Normalisation (1)

What is Normalisation?Informally, normalisation can be thought of as a process defined within the theory of relational database to break up larger relations into many small ones using a set of rules. Normalisation resolve problems with data anomalies and redundancy. It is essentially a two-step process to:

1. put the data into tabular form (by removing repeating groups); and2. to remove duplicated records to separate tables.

As we work through the normalisation process, we will make use of data that relates to the Bus Depots’ Database – a description and E-R model of which was handed out in last weeks session and is also available from the resource area.

Page 3: MSc IT UFIE8K-15-M Data Management Prakash Chatterjee p-chatterjee/ Department of Computer Science and Creative Technologies.

UFIE8K-15-M Data Management 2008

3

Normalisation (2)

Un-normalised data (1)Well-normalised databases have a design that reflects the true dependencies between entities, allowing the data to be updated quickly with little risk of introducing inconsistencies. Before discussing how to design a well-normalised database using Codd's normalisation techniques, we first consider a poor database design.Consider for example a relation 'bus' which includes bus registration number, model, type number, type description, depot name (note that names have changed slightly from the study for the purposes of this example):

registration no model type number type description depot

Al 23ABC Routemaster 1 doubledecker Holloway

D678FGH Volvo 8700 2 metrobus Holloway

H2591JK Daf SB220 3 midibus Hornsey

P200IJK Mercedes 709D 2 metrobus Hornsey

P300RTY Mercedes Citaro 4 bendy-bus Hornsey

R678FDS Daf SB220 1 doubledecker

W653TJH Routemaster 1 doubledecker

Page 4: MSc IT UFIE8K-15-M Data Management Prakash Chatterjee p-chatterjee/ Department of Computer Science and Creative Technologies.

UFIE8K-15-M Data Management 2008

4

Normalisation (3)

Un-normalised data (2)There are several problems with the previous relation: Redundancy - the 'type description' is repeated for each 'type

number' in the relation. The 'model' is also repeated for a particular 'type description', for example a Routemaster is always a doubledecker bus

Update anomalies - as a consequence of the redundancy, we could update the 'type description' in one tuple, while leaving it fixed in another

Deletion anomalies - if we should delete all the buses of a particular type, we might lose all the information about that type

Insertion anomalies - the inverse to deletion anomalies is we cannot record a new type in our table unless there exists a bus of that type - for example if there is the type 'open top' we cannot store this in our database. To get around this we might put null values in the type number and description components of a tuple for that bus, but when we enter an item for that supplier, will we remember to delete the tuple with nulls?

Page 5: MSc IT UFIE8K-15-M Data Management Prakash Chatterjee p-chatterjee/ Department of Computer Science and Creative Technologies.

UFIE8K-15-M Data Management 2008

5

Normalisation (4)

Functional dependencies (1)DeterminantsA formal definition for the term functional dependence is:

Given a relation which has attributes (x, y, ...), we say that an attribute y is functionally dependent on another attribute x, if (and only if) each x value has associated with it precisely one y value (at any one time).For example, examine the following relation:

Cleaner no.(cno)

Cleaner name(cname)

Cleaner salary(csalary)

Depot no.(dno)

110 John 2550 101

111 Jean 2500 101

112 Betty 2400 102

113 Vince 2800 102

114 Jay 3000 102

Page 6: MSc IT UFIE8K-15-M Data Management Prakash Chatterjee p-chatterjee/ Department of Computer Science and Creative Technologies.

UFIE8K-15-M Data Management 2008

6

Normalisation (5)

In the previous diagram, attributes cname, csalary and dno are each functionally dependent on attribute cno - given a particular cno value, there exists precisely one corresponding value for each of the cname, csalary and dno.

In general then, the same x-values may appear in many different tuples of the relation; if y is functionally dependent on x, then every one of these tuples must contain the same y value.

Going back to the cleaner example, we can represent these functional dependencies diagrammatically as:

Page 7: MSc IT UFIE8K-15-M Data Management Prakash Chatterjee p-chatterjee/ Department of Computer Science and Creative Technologies.

UFIE8K-15-M Data Management 2008

7

Normalisation (6)

The previous figure is an example of a determinacy diagram. The arrow line can be read as 'depends on' (reading from left to right). So we say, for example, 'cno depends on cname'. We can also 'read' the diagram from right to left. This time the arrowed line is read as 'functionally dependent on'. So we say, for example 'cname is functionally dependent on cno'.

The attribute or group of attributes on the left-hand side are called the determinant. The determinant of a value is not necessarily the primary key. In the example, cno is a determinant of cname because knowing the cleaner's number we can determine the cleaner's name.

Recognising the functional dependencies is an essential part of understanding the meaning or semantics of the data. The fact that cname, csalary and dno are functionally dependent on cno means that each cleaner has one name, has one salary and works at precisely one depot.

Page 8: MSc IT UFIE8K-15-M Data Management Prakash Chatterjee p-chatterjee/ Department of Computer Science and Creative Technologies.

UFIE8K-15-M Data Management 2008

8

Normalisation (7)

Functional dependencies (2)Composite attributesThe notion of functional dependence can be extended to cover the case where the determinant (particularly the primary key) is composite, i.e. it consists of more that one attribute.Full functional dependenceAn attribute y is defined to be fully fully functionally dependent on attribute x if it is functionally dependent on x and not functionally dependent on any subset of the attributes of x where it is a composite attribute.Partial dependenciesThe opposite of full functional dependence is partial dependence. Where we have data values that depend on only a part of the primary key, then we have a partial dependency.Transitive dependencies This occurs when the value of an attribute is not determined directly from the primary key, but through the value of another attribute and this attribute in turn is determined by the primary key.

Page 9: MSc IT UFIE8K-15-M Data Management Prakash Chatterjee p-chatterjee/ Department of Computer Science and Creative Technologies.

UFIE8K-15-M Data Management 2008

9

Normalisation (8)

The normal forms

A number of normal forms have been proposed, but the first five normal forms have been widely accepted.

The normal forms progress from first normal form, to second, and so on. Data in second normal form implies that it is also in first normal form - i.e. each level of normalisation implies that the previous level has been met.

Other normal forms [have been proposed] such as Boyce-Codd (BCNF) which is an extension of 3NF, [lying between 3NF and 4NF.]

Page 10: MSc IT UFIE8K-15-M Data Management Prakash Chatterjee p-chatterjee/ Department of Computer Science and Creative Technologies.

UFIE8K-15-M Data Management 2008

10

Normalisation (9)

Correspondence between the normal forms:

Page 11: MSc IT UFIE8K-15-M Data Management Prakash Chatterjee p-chatterjee/ Department of Computer Science and Creative Technologies.

UFIE8K-15-M Data Management 2008

11

Normalisation (10)

Normal form exampleConsider the following example forms that record information about cleaners at the Middlesex Depot and the buses they look after. Note that three extra attributes, roster number, roster date and job complete have been added to the original model. The cleaner ticks against the appropriate job after he/she has completed the cleaning of a particular bus.

Page 12: MSc IT UFIE8K-15-M Data Management Prakash Chatterjee p-chatterjee/ Department of Computer Science and Creative Technologies.

UFIE8K-15-M Data Management 2008

12

Normalisation (11)

The un-normalised relation:

Page 13: MSc IT UFIE8K-15-M Data Management Prakash Chatterjee p-chatterjee/ Department of Computer Science and Creative Technologies.

UFIE8K-15-M Data Management 2008

13

Normalisation (12)

First normal form (1 NF)The next step in the normalisation process is to remove the repeating groups from the un normalised relation. A relation is in 1 NF if - and only if - all domains contain only atomic or single values, i.e. all repeating groups of data are removed.

A repeating group is a group of attributes that occurs a number of times for each record in the relation. So for example, in the Roster relation, each roster record has a group of buses (roster record 104 has 6 buses).

Selecting a suitable key for the table

In order to convert an un-normalised relation into first normal form, we must identify the key attribute(s) involved. From the un-normalised relation we can see that each roster has a roster no, each cleaner a cno, each depot a dno, each bus a reg-no and each type a tno. In order to convert an un-normalised relation into normal form, we also have to identify a key for the whole relation. Bearing this definition in mind, on examination the primary key of the relation is roster-no, reg no.

We now draw the determinacy diagram for the roster relation, showing the attributes which are dependent on the primary key:

Page 14: MSc IT UFIE8K-15-M Data Management Prakash Chatterjee p-chatterjee/ Department of Computer Science and Creative Technologies.

UFIE8K-15-M Data Management 2008

14

Normalisation (13)

Determinacy diagram for the first normal form:

Page 15: MSc IT UFIE8K-15-M Data Management Prakash Chatterjee p-chatterjee/ Department of Computer Science and Creative Technologies.

UFIE8K-15-M Data Management 2008

15

Normalisation (14)

Roster relation in first normal form:

Page 16: MSc IT UFIE8K-15-M Data Management Prakash Chatterjee p-chatterjee/ Department of Computer Science and Creative Technologies.

UFIE8K-15-M Data Management 2008

16

Normalisation (15)

The problems with 1 NF are: Redundancy - e.g. roster date, cleaner name etc repeated

Insertion anomaly - a cleaner cannot be inserted into the database unless he/she has a bus to clean

Deletion anomaly - deleting a tuple might lose information from the database. For example, if a cleaner cleaning a particular bus leaves the company, then we lose information for the buses he cleaned

Update anomaly - e.g. a change to the cleaner name means it must change in all tuples which include that cleaner name.

Page 17: MSc IT UFIE8K-15-M Data Management Prakash Chatterjee p-chatterjee/ Department of Computer Science and Creative Technologies.

UFIE8K-15-M Data Management 2008

17

Normalisation (16)

Second normal form (2NF)We now describe the second step in the normalisation process using the relation above which is in first normal form.

Firstly we determine the functional dependencies on the identifying attributes (i.e. the primary key (roster_no, reg_no) and its parts.

If the key is composite, the other attributes must be functionally dependent on the whole of the key. In other words we are looking for partial functional dependencies. In the example, roster date is functionally dependent on the partial key roster_no - there is only one roster_date for a particular roster_no. Also cno, cname, dno, dname etc are all functionally dependent on the partial key reg_no. The attribute 'status', however, is the only attribute fully functionally dependent on the whole of the primary key.

Page 18: MSc IT UFIE8K-15-M Data Management Prakash Chatterjee p-chatterjee/ Department of Computer Science and Creative Technologies.

UFIE8K-15-M Data Management 2008

18

Normalisation (17)

Determinacy diagram for the second normal form:

Page 19: MSc IT UFIE8K-15-M Data Management Prakash Chatterjee p-chatterjee/ Department of Computer Science and Creative Technologies.

UFIE8K-15-M Data Management 2008

19

Normalisation (18)

Roster in second normal form:

Page 20: MSc IT UFIE8K-15-M Data Management Prakash Chatterjee p-chatterjee/ Department of Computer Science and Creative Technologies.

UFIE8K-15-M Data Management 2008

20

Normalisation (19)

2NF has less redundancy than 1NF as we have removed repeating groups. However there are still a number of problems: Redundancy - for example, in the Bus relation, cleaner name is

repeated for each cleaner number Insertion anomaly - a cleaner cannot be inserted into the

database unless he/she is responsible for at least one bus Deletion anomaly - deleting a tuple might lose information from the

database. For example, if we delete a cleaner who is only responsible for that one bus, then we lose information about the cleaner

Update anomaly - e.g. a change to the cleaner name means changes must be made in all tuples which include that cleaner name.

Page 21: MSc IT UFIE8K-15-M Data Management Prakash Chatterjee p-chatterjee/ Department of Computer Science and Creative Technologies.

UFIE8K-15-M Data Management 2008

21

Normalisation (20)

Third normal form (3NF)A 3NF relation is in 2NF but also it must satisfy the non-transitive dependency rule, which states that every non-key attribute must be non-transitively dependent on the primary key. Another way of saying this is that a relation is in 3NF if all its non-key attributes are directly dependent on the primary key. Transitive dependencies are resolved by creating new relations for each entity.There are three transitive dependencies in the Bus relation above as is illustrated by vertical lines in the 2NF determinacy diagram. For example: cno is functionally dependent on reg_no; cname is functionally dependent on reg no. Additionally, cname is functionally dependent cno.We therefore have the transitive dependency:reg no determines cno and cno determines cname thenreg_no determines cnameTwo other transitive dependencies are identified involving tname and dname. The determinacy diagrams for third normal form are given below:

Page 22: MSc IT UFIE8K-15-M Data Management Prakash Chatterjee p-chatterjee/ Department of Computer Science and Creative Technologies.

UFIE8K-15-M Data Management 2008

22

Normalisation (21)

Determinacy diagram for the third normal form:

Page 23: MSc IT UFIE8K-15-M Data Management Prakash Chatterjee p-chatterjee/ Department of Computer Science and Creative Technologies.

UFIE8K-15-M Data Management 2008

23

Normalisation (22)

Roster in third normal form:

Page 24: MSc IT UFIE8K-15-M Data Management Prakash Chatterjee p-chatterjee/ Department of Computer Science and Creative Technologies.

UFIE8K-15-M Data Management 2008

24

Normalisation (23)

Normalform

What is it? What doesprocess do?

How is it achieved?

1 NF Relation in 1 NF if- it contains scalar (atomic)values only

- Removesrepeating groups

- Make a separate relationfor each group of relatedattributes- Give each new relation aprimary key

2 NF Relation in 2NF if- in I NF- all non-key attributes aredependent on the wholeof the primary key andnot part of it

- Removes redundant data

- If an attribute depends ononly part of a multi value key, remove it to aseparate table

3 NF Relation in 3NF if- in 2NF- non-key attributes aredependent on primarykey and independent ofeach other- i.e. non-key attributemust be non-transitivelydependent on the primarykey- a non-key attribute ischanged, that changeshould not affect theothers

- Removesattributes notdependent onthe key therebyfurther reducingredundancy

- Make a separate relationfor attributes transitivelydependent on theprimary key- Give each new relation aprimary key- Original relation willinclude a foreign key tolink to new relation

Page 25: MSc IT UFIE8K-15-M Data Management Prakash Chatterjee p-chatterjee/ Department of Computer Science and Creative Technologies.

UFIE8K-15-M Data Management 2008

25

Bibliography / Readings / Home based activities

Bibliography- An Introduction to Database Systems (8th ed.), C J Date, Addison Wesley 2004- Database Management Systems, P Ward & G Defoulas, Thomson 2006- Database Systems Concepts (4th ed.), A Silberschatz, H F Korth & S Sudarshan,

McGraw-Hill 2002

Readings- Introduction to SQL’ McGraw-Hill/Osbourne (handout)

Home based activities- Ensure you download xampp and install on home PC or laptop (if you have a

slow home internet connection – download to data key or CD here at UWE)- Copy the SQL Workbook onto your data key or CD.- Import the tables from the SQL Workbook into your home MySQL DB. Begin

working through some of the query examples in the workbook using PHPMyAdmin.