Top Banner
Databases 6: Normalization Hossein Rahmani
40

Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Jan 15, 2016

Download

Documents

Beverly Wiggins
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases 6:Normalization

Hossein Rahmani

Page 2: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 62

Design of a Relational DB-Schema»What is a good relational database schema?»Conceptual level (ER-schema, Views)»Implementation level (consistency, efficiency of base

relations = stored tables)» How do you achieve a good schema?»Bottom-up (start with relations between attributes)»design by synthesis

»Top-down (start with ER-schema and then further decomposition): »design by analysis

2

Page 3: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 63

Obtaining a good design

»We first give (informally) 4 guidelines»Then we will give formal requirements

incorporating the guidelines (based on functional dependencies)

Page 4: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 64

Guideline 1: Semantics

»Make sure that the semantics (meaning) of all base relations and attributes is clear »Tuples must be easily interpreted as ‘facts’»Do not mix, if possible, attributes of more than

one entity or relation type in one base relation

4

Page 5: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 65

Examples of poor design

Page 6: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 66

Guideline 2: Redundancy and anomalies

»Avoid redundancy: reduce the space that is needed to store the database as much as possible

»Prevent anomalies when changing data in the database»update (insertion / deletion / modification)

anomalies

6

Page 7: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 67

Redundancy - example

Page 8: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 68

Update Anomalies

»Cause: doubly stored data, wrong design (example: see next slide)»insertion anomalies:

»new tuple contains incorrect attribute value for an already stored entity

»new entity has a null key

»deletion anomalies»Incomplete deletion of an entity»unwanted deletion of an entity

»modification anomalies»incomplete modification of an entity

8

Page 9: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 69

Guideline 3: NULL-values

»Some base relations contain many attributes that often are ‘NULL’»Unnecessary use of space»Multiple meanings of ‘NULL’»JOIN operations can have undesired effects»COUNT and SUM can go wrong

»SO: place an attribute in a base relation in which it is as least as possible ‘NULL’

9

Page 10: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 610

Guideline 4: False (Spurious) Tuples

»If we select base relations wrong, a (NATURAL-)JOIN can create tuples that do not have any connection with the mini world (see next slides)

»So: select base relations such that at a JOIN on primary or foreign keys, no spurious tuples can occur. Don’t JOIN on other attributes

10

Page 11: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 611

Wrong choice - relations

Page 12: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 612

Wrong choice: states

Page 13: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 613

Natural join -> spurious tuples (marked *)

Page 14: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 614

Normalization

»Using ‘normalization’, you can adhere to these guidelines for a large part

»In a number of steps (algorithms) you transfer a given relational database schema into an ever higher normal form

»Base concept: functional dependency

14

Page 15: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 615

Functional dependency

»Start with one universal relation schema R containing all attributes A1,..,An

»Given two attribute sets X and Y in R»Functional dependency X Y exists (X

functionally determines Y; Y is functionally dependent on X) if: »r(R): t1,t2 r: t1[X] = t2[X] t1[Y] = t2[Y]» i.e.: component X determines component Y

15

Page 16: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 616

Functional dependency

16

A B C

AB C

Page 17: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 617

Functional dependency»If X is a superkey of R then X Y holds for

each set Y of attributes in R»If X Y, then nothing can be concluded on

the existence of Y X»X Y follows from the semantics of the

attributes in X and Y (which means that the designer should note and declare it)

»r(R) is legal if it agrees with all functional dependencies (FDs) declared on R

17

Page 18: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 618

Inference rules for FDs

»Six rules for deriving FDs:»IR1 (reflexive): if X Y then X Y (trivial)

As a special case: X X »IR2 (extension): {X Y} |= XZ YZ»IR3 (transitive): {X Y, Y Z} |= X Z»IR4 (project): {X YZ} |= X Y»IR5 (combine): {X Y, X Z} |= X YZ»IR6 (pseudotransitive): {X Y, WY Z} |= WX Z

18

Page 19: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 619

Closure F+ of F

»Given a set of FDs for R: F(R)»IR1-3 is sound & complete (Armstrong) »Sound: If a new FD f can be derived from F(R) using

IR1-3, and r(R) is legal for F, then r(R) is also legal for F {f}»Complete: If FD f holds on R then f can be derived

from F using IR1-3

»The set F+(R) of all FDs that can be derived from F, is called the closure F+ of F(R)

19

Page 20: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 620

Closure X+ under F

»Given X Y F. The closure X+ of X under F is the set of all attributes that are also functionally dependent on X

20

X+ := X ;do { oldX+ = X+ ; for all Y Z F : if X+ Y then X+ = X+ Z;

} while (oldX+ X+)

Algorithm to determine X+:

Page 21: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 621

Equivalence

»Two sets of FDs, F and E are equivalent (F E) iff F+ = E+

»Semantically: if F E, then r(R) is legal for F iff r(R) is legal for E

»By definition: F |= f iff F F {f}»For each set F there exist many equivalent

sets of FDs. We prefer simplicity: minimal cover

21

Page 22: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 622

Minimal Cover

»We can translate any F into an equivalent minimal cover G

»A set FDs G is a minimal cover of F if G F and»for all X Y G, Y has exactly one attribute (so, if

X YZ, then split into X Y and X Z) »We cannot remove any X Y from G without

loosing equivalence with F»We cannot replace any X Y in G by W Y with

W X, without loosing equivalence with F

22

Page 23: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 623

Algorithm for Minimal Cover

23

1) Start with G := F ;2) Replace all X Y with Y = {A1,..,An} by X Ai ; (IR4)3) For all XY A: if G - {XY A} G {X A} then replace XY A with X A;4) If G - {X A} G then remove X A;

1) AB CD; C D; A CB;2) AB C; AB D; C D; A C; A B;3) A C; A D; C D ; A B;4) A C; C D ; A B;

Example:

Page 24: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 624

Normal forms

»Invented by Codd as a test on relational database schemas»The tests (‘normal forms’) grow more severe.

The more severe the test, the higher the normal form, the more robust the database»If a schema does not pass the test, it is

decomposed in partial schemas that do pass the test»It is not always necessary to reach the highest

possible normal form24

Page 25: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 625

1NF - First Normal Form

»Attributes can only be single-valued»Is a basic demand of most relational

databases

»Example of a non-1NF relation (see next slide). This normally is already ruled out by the definition of a relation (so using the relational database model automatically ensures 1NF)

25

Page 26: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 626

Non-1NF - example

Page 27: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 627

1NF - First Normal Form

»Solutions for a multi-valued attribute A in R:»Preferred: create new relation S with A and a

foreign key to R »Extend the key of R with an index number for

the values of A (redundancy!); e.g. department has no. 5A, or 5B, or 5C»Determine the maximum number of values per

tuple for A (say k) and replace attribute by k attributes (say, loc1, loc2, and loc3). This introduces null-values!

27

Page 28: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 628

2NF - Second Normal form

»Definition: X Y is a partial functional dependency if there is an attribute A in X s.t. X-{A} Y

»X Y is total if it is not partial

»2NF: each non-primary attribute is totally dependent on primary key (and not on parts of the primary key)

28

Page 29: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 629

2NF - Normalizing

»Break up the relation such that every partial key with their dependent attributes is in a separate relation. Only keep those attributes that depend totally on the primary key

»Example (see next slide)

29

Page 30: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 630

2NF - example

Page 31: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 631

3NF - Third Normal Form

»Definition: X Y is a transitive dependency if there is a Z that is not (part of) a candidate key s.t. X Z and Z Y

»3NF: no non-primary attribute is transitively depending on the primary key

31

Page 32: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 632

3NF - Normalizing

»Break up the relation such that the attributes that are depending on not-key attributes appear in a separate table (together with the attributes on which they depend)

»Example (see next slide)

32

Page 33: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 633

3NF - example

Page 34: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 634

General form 2NF and 3NF

»Put the same demands on all candidate keys (super keys) – which is more severe»2NF: every non-key attribute is totally

dependent on all keys»3NF: no non-key attribute is transitively

dependent on any keyOther formulation: if X A then A is prime or X is a super key

»Example (see next slide)

34

Page 35: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 635

General form 2NF and 3NF - example

1NF

2NF

Page 36: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 636

General form 2NF and 3NF - example

2NF

3NF

Page 37: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 637

Boyce-Codd Normal Form

»Simpler, but stronger than 3NF»BCNF: for each non-trivial dependency

X A holds that X is a super key

»Difference: in 3NF if A is a prime attribute, X does not have to be super key

»In many cases a 3NF schema is also BCNF

37

Page 38: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 638

BCNF example

Page 39: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 639

Decompositions

»Only adhering to a normal form is not enough

»We must not lose attributes in the process!»Non-additive Join-property:»a natural join of the result of a decomposition

should result in the original table, without spurious tuples

»There exist algorithms to automatically find good decompositions

39

Page 40: Databases 6: Normalization Hossein Rahmani. Databases lecture 62 Design of a Relational DB-Schema » What is a good relational database schema? » Conceptual.

Databases lecture 640

ER-schema to relational schemas

»A relational database schema that is mapped from an ER-schema is often in BCNF, but always in 3NF (so, check if BCNF is applicable and useful)

»Many CASE-tools can map an ER-schema automatically into a good relational schema (e.g., SQL create-table commands)

40