Top Banner
Part 6 Part 6 Chapter 15 Chapter 15 Normalization of Relational Normalization of Relational Database Database Csci455 r[email protected] 1
69

Part 6 Chapter 15 Normalization of Relational Database Csci455 r [email protected] [email protected] 1.

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

Part 6 Part 6 Chapter 15 Chapter 15

Normalization of Relational DatabaseNormalization of Relational Database

[email protected]

1

Page 2: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

• Design Methodologies• Goodness of design• functional dependencies• The normalization process and normal forms

– First, second, third, BCNF• Pros and cons of normalization

2

ObjectivesObjectives

Page 3: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

• Database system can be designed via– Bottom-up (design by synthesis)– Top-Down (design by analysis)

3

Design MethodologyDesign Methodology

Page 4: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

• Starts with the basic relationships between pair of attributes

• Uses these information to construct the relations

• not scalable and practical

4

Bottom-up design Bottom-up design

Page 5: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

• The design process– Starts with one relation (set of all attributes)– Decomposes it into groups

• Use ER to model the conceptual schema• Existing design knowledge or experiences

– Maps each entity into table schema – Analyzes each table schema for goodness

• possible refinement and/or decomposition

5

Top-down designTop-down design

Page 6: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

• Informal design metrics Semantics of the related attributes Reducing the redundant values in tuples Minimizing the NULL values Disallowing spurious tuples

6

Informal Design Guidelines for Relational Informal Design Guidelines for Relational SchemasSchemas

Page 7: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

• Based on the semantics of attributes or how the attributes values in a tuple relate to one another– A schema should capture facts about one entity or

one relationship type

7

Semantics of the Relation AttributesSemantics of the Relation Attributes

Page 8: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

8

Page 9: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

9

Fig10-2Fig10-2

Page 10: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

• Design a relation schema so that it is easy to explain its meaning– do not combine attributes from multiple entity

types and relationship types into a single relation

10

Guideline 1Guideline 1

Page 11: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

11

Fig10-3Fig10-3

Considered as POOR designs! Why?

Page 12: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

• The important objective of schema design – to minimize the storage space and effort – to minimize problems resulted from updates

• Example – Compare relations in Fig15.2 with those in Fig.15.4

12

Redundant Information in Tuples and Redundant Information in Tuples and Update AnomaliesUpdate Anomalies

Page 13: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

13

Fig10-2Fig10-2

Page 14: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

14

Fig10-4Fig10-4

Page 15: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

• Update Anomalies – Insertion anomalies– deletion anomalies– Modification anomalies

15

Update AnomaliesUpdate Anomalies

Page 16: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

• Insertion Anomalies • Consistency:

– E.g., insert a new employee » need to insert ALL attributes for Department, » or insert NULL if employee does not work

• Null values: – E.g., insert a new department, with no employee

» violation of Entity integrity because ssn cannot be NULL

• e.g., EMP_DEP fig 15.416

Insertion AnomaliesInsertion Anomalies

Page 17: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

17

Fig10-4Fig10-4

Page 18: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

• Deletion Anomalies – Loss of Information

• E.g., – delete the very last employee who works for dnum=1 from

EMP_DEPT

18

Deletion AnomaliesDeletion Anomalies

Page 19: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

19

Fig10-4Fig10-4

Page 20: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

• Modification Anomalies– Change one, change all

• E.g., change dept. Mgr or dept. number

20

Modification AnomaliesModification Anomalies

Page 21: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

21

Fig10-4Fig10-4

Page 22: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

• Design anomaly-free base relation schemas– How? use formal approaches to validate design

against these guidelines

22

Guideline 2Guideline 2

Page 23: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

• Results in a set of attributes that do not apply to all tuples– E.g., Student Phone number

• Not every student has a cell phone or work phone

• Guideline 3– Stay a way from attributes with NULL values in the base

table• Waste storage, difficulties to understand, aggregate functions,

and operations involving comparisons (e.g. join operation)

23

Null Values in TuplesNull Values in Tuples

Page 24: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

• Refers to the undesirable decomposition of a relation– E.g.,

• EMP_LOC and EMP_PROJ1

24

Generation of Spurious (or invalid) TuplesGeneration of Spurious (or invalid) Tuples

Page 25: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

25

Fig10-5Fig10-5

Page 26: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

26

Fig10-6Fig10-6

ENAME

Page 27: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

• Design relation schema so that they can be JOINED with equality conditions on attributes that are either PKs or FKs

27

Guideline 4Guideline 4

Page 28: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

Summary and discussion of design Summary and discussion of design guidelinesguidelines

• The problems discussed can be avoided using the following guidelines1. Anomalies that cause redundant work to be

done during insertion, deletion, and modifications

2. Waste of storage space due to NULL3. Generations of invalid and spurious data during

Join on base relations using non-key attributes

28

Page 29: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

• Refers to a requirement between two sets of attributes: X and Y such that– For two tuples t1, and t2 in r(R)

• if t1[X]=t2[X] t1[Y] =t2[Y]

• Used to define normal forms

29

Functional DependenciesFunctional Dependencies

Page 30: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

• Represented by X Y– X functionally determines Y– or, Y functionally depends on X– if for each X value, we have ONLY one Y value,

then X is Candidate Key (CK)• Note: FD is the property of the semantics or

meaning of attributes• Legal relation states (legal extensions) of R

30

Functional Dependencies (FD): Formal Functional Dependencies (FD): Formal definitiondefinition

Page 31: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

• The notion of dependency has to do with a schema-based dependency – It is a semantic notation– FD is part of the process of understanding what

the data means

31

Properties of functional dependencies Properties of functional dependencies (FDs)(FDs)

Page 32: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

32

Fig10-3Fig10-3

(b) EMP_PROJSSNENAME PNUMBER{PNAME, PLOCATION}{SSN, PNUMBER} HOURS

Page 33: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

• Legal extensions (or legal relation): – Refers to the extensions r(R) that satisfy the functional

dependency constraint • A FD is a property of the relation schema not the

relation extension

33

Important Notes on FDsImportant Notes on FDs

Page 34: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

34

Fig10-7Fig10-7

FD1: TEXT COURSE ? Yes or no

FD2: TEACHER COURSE? No

FD3: COURSETEACHER? No

Page 35: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

• Normalization theory: – builds around the concept of normal forms– used in the design process

• a relation is in a particularly normal form if it satisfies a specified set of requirements– E.g.,

• 1NF (i.e., all underlying domains MUST have atomic values)

35

NormalizationNormalization

Page 36: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

• Type of Normal Forms– 1NF– 2NF– 3NF– BCNF– 4NF– 5NF (PJ/NF)– DKNF (absolute normal form)

36

Normal FormNormal Form

Page 37: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

1NF

2NF

3NF/BCNF

4NF

5NF

DKNF

37

Relationships of Normal FormsRelationships of Normal Forms

Page 38: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

• 1NF prevents– multi-valued attributes, – composite attributes– combinations of the above

• See fig 15.8• See fig 15.9

– nested relation or multivalued composite attributes

38

First Normal Form (1NF)First Normal Form (1NF)

Page 39: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

39

Fig10-8Fig10-8

Page 40: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

40

Fig10-9Fig10-9

Page 41: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

• Based on the concepts of full functional dependency• Analogy to the traditional justice oath:

– Every non-key attribute depends on a key, the whole key, and nothing but the key

• R is in 2NF iff – R is in 1NF– Every non-key attribute is fully depend on the PK

41

Second Normal Form (2NF)Second Normal Form (2NF)

Page 42: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

Normalization into 2NF, and 3NFNormalization into 2NF, and 3NF

42

Page 43: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

43

Fig10-10Fig10-10

Page 44: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

• Based on the concepts of transitive dependency

• Relation R is in 3NF iff– R is in 2NF– Every non-key attribute is non-transitively

dependents on the PK

44

Third Normal FormThird Normal Form

Page 45: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

45

Fig10-10Fig10-10

Page 46: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

• Formal Definition– R is in 3NF if, whenever a functional dependency

XY exists then• X is super key • Y is prime attribute

• e.g.,– LOTS2 in fig.15.12.b is 3NF– LOTS1 in fig.15.12.b (FD4) is NOT 3NF

46

Interpretation of 3NFInterpretation of 3NF

Page 47: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

47

Page 48: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

48

Page 49: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

49

Page 50: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

50

Page 51: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

Alternative definition of 3NFAlternative definition of 3NF

• A relation schema R is in 3NF if every non-prime attribute of R satisfies the following conditions:– Non-primed attribute fully functionally depends

on every Key of R– Non-primed attribute is non-transitively depend

on every key of R

51

Page 52: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

• Boyce-Codd normal form– A more restricter formal form than 3NF

•If R is BCNF then R is also in 3NF•R in 3NF does not mean R is BCNF

– Attempts to eliminate more redundancy not detectable by 3NF

52

Boyce/Codd NFBoyce/Codd NF

Page 53: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

ExampleExample

• Suppose we have thousands of lots in the relation but the lots are from only two counties– DeKalb and Fulton

• Let say lot sizes in – The Dekalb are 0.5.,…,1.0 acres– The Fulton are 1.1, 1.2, …1.9,2.0 acres

• Also assume that– FD5: Area County_Name

53

Page 54: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

54

Fig10-12Fig10-12

Page 55: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

• A relation R is in Boyce/Codd normal form (BCNF) iff – Every determinant is a CK

• (i.e., each attribute MUST describe the key, the whole key, and nothing but the key)

• Ensures no redundancy (GOOD)• Considered the most desirable NF

55

Boyce/Codd NF (Cont’)Boyce/Codd NF (Cont’)

Page 56: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

• Consider a relation TEACH with– FD1: {Student, Course} Instructor– FD2: Instructor Course

• The relation is 3NF• Is it in BCNF? No

56

ExampleExample

Candidate key

Page 57: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

BCNF ExampleBCNF ExampleSemanticsSemantics

• A student can take more than one course• But a student has a different instructor for

each course.• Each instructor (non-key) teaches only one

course (partial key).

57

Page 58: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

58

Fig10-13Fig10-13

Page 59: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

• Possible decompositions are1. {Student, Instructor} and {Student, Course}2. {Course, Instructor} and {Course, Student}3. {Instructor, Course} and {Instructor, Student}

• Which of the decomposition is better? Justify it.

59

More on ExampleMore on Example

Page 60: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

Instructor-course TableInstructor-course Table

Instructor Course

Mark Database

Navathe Database

Schulman Theory

Ahmand OS

Omiecinski Database

Ammar OS

60

Page 61: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

Instructor-student TableInstructor-student Table

Instructor StudentMark Narayan

Mark Wallace

Navathe Smith

Navathe Zelaya

Ammar Smith

Ammar Narayan

Schulman Smith

Ahmand Wallace

OMIECINSKIw Wong61

Page 62: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

• Decomposition: Pros and cons– Makes answering the complex queries less efficient (BAD)

because additional joins must be performed during query (BAD)

– May increase storage requirements if the degree of redundancy is very low (BAD)

– May decrease storage requirements if the degree of redundancy is very high (Good)

– Makes simple update transaction more efficient (GOOD)

62

To decompose or Not to decompose?To decompose or Not to decompose?

Page 63: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

Multivalued DependencyMultivalued DependencyFourth Normal FormFourth Normal Form

• We discussed the concept of functional dependency (FD)• Other constraints that cannot be specified as functional dependencies is

– multivalued dependency (MVD) and define fourth normal form, which is based on this dependency

• It is a direct consequence of first normal form (1NF) which disallows an attribute in a tuple to have a set of values

• Happens when have two or more multivalued independent attributes in the same relation schema

– i.e., having a relation consists of multiple 1:Ns

63

Page 64: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

• Multivalued dependency(MVD) XY on R, – where XYR, and Z = (R – (XY)) specifies the

following conditions on r(R):• t3[X]= t4[X]= t1[X]= t2[X]• t3[Y]=t1[Y] and t4[Y] = t2[Y]• t3[Z]=t2[Z] and t4[Z] = t1[Z]

• 4NF typically involves eliminating MVDs by repeated binary decompositions as well.

64

Formal Definition of Multivalued DependencyFormal Definition of Multivalued Dependency

Page 65: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

65

Page 66: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

Join Dependencies (JD)Join Dependencies (JD)Fifth Normal Form (Project-Join)Fifth Normal Form (Project-Join)

• Join dependency – constraint on the set of legal relations over a database

scheme. – A table T is subject to a join dependency if T can always be

recreated by joining multiple tables each having a subset of the attributes of T

– Join operation must satisfy the lossless (or nonadditive) join property

• A very specific semantic constraint and very difficult to detect in practice– there is no sound and complete axiomatization for join dependencies

66

Page 67: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

Example (JD)Example (JD)

• Suppose that the following additional constraint always holds:– Whenever a supplier s supplies part p, – and a project j uses part p, – and the supplier s supplies at least one part pi to

project j, – Then supplier s will also be supplying part p to

project j.

67

Page 68: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

68

Page 69: Part 6 Chapter 15 Normalization of Relational Database Csci455 r eza@aero.und.ed eza@aero.und.ed 1.

Quiz: March 10, 2015Quiz: March 10, 2015

69