Top Banner
Normal Forms
40

Normal Forms

Jan 29, 2016

Download

Documents

phong

Normal Forms. First Normal Form: all table cells must contain atomic values no sets, arrays, lists, or other collection types no structured objects all relational databases satisfy first normal form by definition - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Normal Forms

Normal Forms

Page 2: Normal Forms

First Normal Form:

• all table cells must contain atomic values

− no sets, arrays, lists, or other collection types

− no structured objects

• all relational databases satisfy first normal form by definition

− the definition restricts attribute values to singletons from specified domains

• indeed, relations that do not satisfy first normal form arespecifically called unnormalized relations

Page 3: Normal Forms

Definitions:

denotes a relation; ( ) is the corresponding body

is a functional dependency

The universal relation

contains all application attributes

( ) contains join of all database tables

R r R

X Y

R

r R

( ) is valid if it satisfies all FDs implied by the

application constraints

r R

Goal of functional dependency analysis is to decompose theuniversal relation into components which cannot becomeinconsistent with respect to any functional dependencyconstraint

Page 4: Normal Forms

Some requirements:

1

2

1 2

1

2

1 2

Suppose

For any valid ( ), must have

( ) ( ( ))

( ) ( ( ))

( ) ( ) ( )

R

R

R R R

r R

r R r R

r R r R

r R r R r R

+ indicates union of attributes

lossless join

1 2Note: [ ( ( ))] [ ( ( ))] ( )

Perhaps join contains too much ....

spurious tuples ...

R Rr r R r r R r R

Page 5: Normal Forms

A B C

1 2 35 2 4

A B

1 25 2

B C

2 32 4

A B C

1 2 31 2 45 2 35 2 4

A,BB,C

spurious tuples

Lossy join:

Page 6: Normal Forms

More definitions:

1 2

1 2

{ | ( ) is imposed by application}

{ | application FDs imply }

is a decomposition

( ) applies to if

is a valid decomposition if

(1) it i

m

i i

m

X Y X Y

X Y X Y

R R R R

X Y R X Y R

R R R R

F

F

F

s lossless

(2) if ( ) satisfies and ( ) applies to ,

then ( ) satisfies i

i

r R X Y R

r R X Y

F F

Page 7: Normal Forms

Whence functional dependencies:

Analysis of application

Closure algorithm

Minimal cover algorithm '

F

F

F

F

F

'F

Trick: compute minimal cover directly from application FDs without calculating closure

Page 8: Normal Forms

Yet more definitions:

+

Let be the universal relation

is a set of functional dependencies

We say that implies ( ) if the following condition holds:

( ) satisfies ( ) satisfies ( )

{ | implies (

R

X Y

r R r R X Y

X Y X

F

F

F

F F

1 2

1 2 1 2 1 2

)} is the closure of

Let , be sets of functional dependencies

, are equivalent, written , if

Y

F

F F

F F F F F F

Page 9: Normal Forms

2 1 2 1

2

2 2 2

2

is a minimal cover of if and

The right side of every FD in contains a single attribute

If ( ) , then [ { }]

That is, no FD in is redundant

I

X A X A

F F F F

F

F F F

F

2 2 2

2

f ( ) , then [ ( { }) { }]

That is, no FD in has a redundant attribute on its left side

is minimal if it is a minimal cover of itself

XA B XA B X B

F F F

F

F

Yet more definitions (continued):

Page 10: Normal Forms

1. Reflexivity: If , then ( ) (regardless of )

2. Augmentation: If ( ) and , then ( )

3. Transitivity: If ( ) and ( ) , then ( )

4. Composition: If ( ) and (

Y X R X Y

X Y Z R XZ YZ

X Y Y Z X Z

X Y X

F F

F F

F F F

F ) , then ( )

5. Decomposition: If ( ) , then ( ) and ( )

6. Pseudotransitivity: If ( ) and ( ) , then ( )

Z X YZ

X YZ X Y X Z

X Y WY Z WX Z

F F

F F F

F F F

Armstrong's Axioms

Page 11: Normal Forms

Theorem: Armstrong's Axioms are both sound and complete.

That is, if you apply them repeatedly to a given set of FDs, then

(1) every new FD generated belongs to the closure (sound)

(2) every FD in the closure will eventually be generated (complete)

Page 12: Normal Forms

Definition:

( ; ) { | ( ) follows from via Armstrong's Axioms}X A X A F F

Because inferences via Armstrong's Axioms yields precisely theFD set closure, this definition is the same as:

( ; ) { | ( ) }X A X A F F

Page 13: Normal Forms

Generating the closure is computationally expensive because of itsexponential size

But, checking a given FD for inclusion in the closure is easy:

( ) iff ( ; )X Y Y X F F

Page 14: Normal Forms

Algorithm uses only the application-driven FDs, not the closure

attributeSet maxRight (attributeSet , FDSet ) {

attributeSet ;

boolean changed = true;

while (changed)

if [ ( ) with and ]

;

else

X

Z X

U V U Z V Z

Z Z V

F

F

changed = false;

return ;

}

Z

attributeSet FDinClosure (attributeSet , attributeSet , FDSet ) {

/* returns true if ( ) */

return [ maxRight( , )];

}

X Y

X Y

Y X

F

F

F

Page 15: Normal Forms

Other uses of the algorithm: key determination

is a superkey for iff ( ) for every

is a key for iff is a superkey and for all ,

( ) is not a superkey

Algorithm: For each subset , use maxRight procedure to

con

X R X A A R

X R X B X

X B

X R

F

struct ( ; ). If ( ; ) , then is a superkey; otherwise,

it is not. A superkey that does not properly contain another superkey

is a key.

X X R XF F

Page 16: Normal Forms

Other uses of the algorithm: equivalence of two sets of FDs

1 2 1 2 2 1

1

2 2

iff and

Algorithm: for each ( ) , use maxRight to

determine if ( ) . Similarly, for FDs in .

X Y

X Y

2

F F F F F F

F

F F

Page 17: Normal Forms

Other uses of the algorithm: minimal cover computation:

1

mutate application-driven into an equivalent FD set

that satisfies the minimality conditions.

Algorithm:

FDSet minCover (FDSet ) {

FDSet = ;

boolean change

F

F

F

1 1

2 = true, change3 = true;

for each ( )

for each

= { };

while (change2 || change3) {

change2 = false;

while

X Y

A Y

X A

F

F F

1 1 1

1 1

1 1 1

[ ( ) with { }]{

change2 = true;

{ };

}

change3 = false;

while [ ( ) with ( {

X A X A

X A

XB A XB

F F F

F =F

F F F

1 1

1

}) { }]{

change3 = true;

( { }) { };

}

return ;

}

A X A

XB A X A

F F

F

Page 18: Normal Forms

Boyce-Codd Normal Form (BCNF):

1 2

An FD ( ) is called trivial if

Let be a lossless decomposition with respect to

satisfies BCNF if, for every nontrivial FD ( ) that applies to ,

is a superkey o

n

i i

X Y Y X

R R R R

R X A R

X

F

F

f

The decomposition satisfies BCNF if every component satisfies BCNF

There exists an algorithm to decompose any universal relation and an

associated application-driven set of FDs (or a

i

i

R

R

R

F n equivalent minimal

cover) into BCNF

The algorithmic decomposition is lossless, but unfortunately, it is not always

dependency-preserving

Page 19: Normal Forms

Dependency preservation:

1 2

1

1

Let be a lossless decomposition with respect to

Define ( ) { | ( ) and ( ) applies to }

Decomposition is dependency-preserving if ( )

Certainly, ( )

n

i i

n

ii

n

ii

R R R R

X Y X Y X Y R

F

F F

F F

F +, but is potentially larger (stronger)

F F

Page 20: Normal Forms

Example: BCNF decomposition process

:

A D C B

A C EF H

EF J EF L

FL G HJ K

F:

: R A B C D E F G H J K L:

Compute ( ; ) :

( ; ) is not a key ( ), ( ), ( ) are all

BCNF violators

Decompose: ( ; ) and [ ( ; )] { }

is a key of ( ; ) join is lossless

A A AD ADC ADCB R

A A C A D A B

A R A A

A A

F

F

F F

F

Page 21: Normal Forms

Example: BCNF decomposition process

:

A D C B

A C EF H

EF J EF L

FL G HJ K

F:

: R A B C D E F G H J K L:

1

2

:

:

R A B C D

R A E F G H J K L

:

no longer BCNF violators A is a superkey of first relation FDs no longer apply to the second

But, still have a violator here

Page 22: Normal Forms

:

A D C B

A C EF H

EF J EF L

FL G HJ K

F:

: R A B C D E F G H J K L:

1

11 12

:

: :

R A B C D

R B C R A C D

: 2: R A E F G H J K L

2( ; )EF EFHJLGK R F

BCNF violators

Page 23: Normal Forms

:

A D C B

A C EF H

EF J EF L

FL G HJ K

F:

: R A B C D E F G H J K L:

1

11 12

:

: :

R A B C D

R B C R A C D

:

2

21 22

:

: :

R A E F G H J K L

R A E F R E F G H J K L

22( ; )FL FLG R FBCNF violator

221

222

:

:

R F G L

R E F H J K L

Page 24: Normal Forms

:

A D C B

A C EF H

EF J EF L

FL G HJ K

F:

: R A B C D E F G H J K L:

1

11 12

:

: :

R A B C D

R B C R A C D

:

2

21 22

:

: :

R A E F G H J K L

R A E F R E F G H J K L

222( ; )HJ HJK R F

221

222

:

:

R F G L

R E F H J K L

2221

2222

:

:

R H J K

R E F H J L

BCNF violator

Page 25: Normal Forms

Aquarium database --- minimal cover

• sno → sname• sno → sfood

• tno → tname• tno → tvolume

• fno → fname• fno → fcolor• fno → fweight

• eno → edate• eno → enote

• tvolume → tcolor

• eno → fno• fno → tno• fno → sno

Only superkeys: supersets of eno ==> all FDs are BCNF violators except thosewith eno on left side

BCNF:

sno sname sfood tno tname tvolume

tvolume tcolor

fno fname fcolor fweight sno tno

eno edate enote fno

Page 26: Normal Forms

BCNF lossless decomposition algorithm:

/* start with universal relation; decompose BCNF violating components */

while ( ) applies to a component and ( ; )

decompose into ( ; ) and [ (i i

i

X Y R X R

R X R X

F F

F ; )] { }XF

Note: it suffices to look for BCNF violators in minimal cover of original application-driven FDs

In applying Armstrong's axioms to generate FD closure, only augmentation and pseudotransitivitygenerate new left hand sides; none generate any new right-side attributes

Augmentation: If ( ) and , then ( )

But, if ( ; ) , then because ( ; ) ( ; ), we have ( ; )

That is, if is a BCNF violator, so is

Pseudotransitivity: If ( )

i

X Y Z R XZ YZ

XZ R X XZ X R

XZ YZ X Y

X Y

F F

F F F F

F and ( ) , then ( )

But, if ( ; ) , then because ( ; ) ( ; ), we have ( ; )

That is, if is a BCNF violator, so is i

WY Z WX Z

WX R X WX X R

WX Z X Y

F F

F F F F

Page 27: Normal Forms

BCNF decomposition is lossless, but not always dependency-preserving

1 2

1

Let be a decomposition with respect to

Define ( ) { | ( ) and ( ) applies to }

Decomposition is dependency-preserving if ( )

n

i i

n

ii

R R R R

X Y X Y X Y R

F

F F

F F

Example:

1 2

Let ( , , ) ( , )

( ; ) ( ) is not a BCNF violator

( ; ) ( ) is BCNF violator decompose into ( , ) and ( , )

No further BCNF violators have BCNF d

R A B C AB C C A

AB ABC R AB C

C CA R C A R A C R B C

F

F

F

ecomposition

Page 28: Normal Forms

Example (continued):

+ 1

1

+ 2

Only possible non-trivial entries from in ( ) are ( ) or ( )

But, ( ; ) { }, so ( ; ) ( )

So, ( ) { } plus trivial FDs

Only possible non-trivial entries from in (

A C C A

A A C A A C

C A

F F

F F F

F

F

2

1 2

) are ( ) or ( )

But, ( ; ) { }, and ( ; ) { } ( ) and ( )

So, ( ) contains only trivial FDs

Consequently, , but ( ) ( ) ( )

So, decomposition is not d

B C C B

B B C C B C C B

AB C AB C

F

F F F F

F

F F F F

ependency-preserving

Page 29: Normal Forms

In general:

• if decomposition does NOT split any FD from the minimal cover, then decomposition IS dependency-preserving

• if decomposition does split an FD from the minimal cover, thenit may or may not be dependency-preserving(text presents an algorithm for deciding)

Page 30: Normal Forms

Second Normal Form:

1 2 Let be a decomposition

Let be a set of application-driven constraints (or minimal cover)

is prime if is contained in a key for

satisfies Second Normal Form (2NF) if

n

i i

i

R R R R

A R A R

R

F

(1) satisfies 1NF, and

(2) non-prime and a key of (i.e., ( ; ))

( ) for every ,

A 2NF violator has components: non-prime , key ,

i

i i

R

A X R R X

Y A Y X Y X

A X Y X

F

F

such that and ( )

The decomposition satisfies 2NF if each component satisfies 2NF

Y X Y A

F

Page 31: Normal Forms

Note: BCNF implies 2NF

Suppose have 2NF violator in BCNF decomposition

That is, have non-prime , key , such that and ( )

By BCNF criterion, is a superkey key is not a key,

A X Y X Y X Y A

Y Z Y X X

F

a contradiction.

Page 32: Normal Forms

Example:

sno sname sfood tno tname tcolor tvolume

• only key is (sno, tno), but (sno → sname) and sname is non-prime

• have several 2NF violators

• signals mixing of application entities in a single table

• decompose into:

sno sname sfood tno tname tcolor tvolume

Note: (1) tables with single-attribute keys cannot violate 2NF (2) tvolume --> tcolor is not a 2NF violator because tvolume is not part of a key

• sno → sname• sno → sfood

• tno → tname• tno → tvolume

• tvolume → tcolor

• eno → fno• fno → tno• fno → sno

• fno → fname• fno → fcolor• fno → fweight

• eno → edate• eno → enote

Page 33: Normal Forms

Storage anomalies (irregularities):

• classify negative consequences of 2NF violators

• insertion anomaly− relation with a 2NF violator mixes two or more application entities− can't insert an instance of just one of them− e.g., in previous example, want to insert a dolphin species that is

as yet unassociated with any tank ...

sno sname sfood tno tname tcolor tvolume

12 shark everything 25 lagoon blue 5000 12 shark everything 27 deep dive green 50000 ...... ......

17 dolphin herring ---- ------ ----- ------

insertion forces nulls into inappropriate attributes, such as part of the key (sno, tno)

Page 34: Normal Forms

Storage anomalies (irregularities):

• deletion anomaly− relation with a 2NF violator mixes two or more application entities− deleting last tank associated with shark species removes all information

about the species

• update anomaly− change sfood attribute of a shark => update several tuples− invites inconsistency

sno sname sfood tno tname tcolor tvolume

12 shark everything 25 lagoon blue 5000 12 shark everything 27 deep dive green 50000 ...... ......

• 2NF decomposition removes most of anomalies:

sno sname sfood

12 shark everything17 dolphin herring

tno tname tcolor tvolume

25 lagoon blue 500027 deep dive green 50000

Page 35: Normal Forms

Storage anomalies (irregularities):

• there remains the negative effect of tvolume --> tcolor− if change all 5000 volume tanks to purple, must update several tuples− still invites inconsistency

• in general, anomalies arise when the contents of some cells can predict thecontent of others

sno sname sfood

12 shark everything17 dolphin herring

tno tname tcolor tvolume

25 lagoon blue 500027 deep dive green 5000084 puddle blue 5000

• sno → sname• sno → sfood

• tno → tname• tno → tvolume

• tvolume → tcolor

• eno → fno• fno → tno• fno → sno

• fno → fname• fno → fcolor• fno → fweight

• eno → edate• eno → enote

predictable entry

Page 36: Normal Forms

Third Normal Form:

1 2 Let be a decomposition

Let be a set of application-driven constraints (or minimal cover)

is prime if is contained in a key for

satisfies Third Normal Form (3NF) if

n

i i

i

R R R R

A R A R

R

F

(1) satisfies 1NF, and

(2) non-trivial ( ) implies either (1) is a superkey for

or (2) is prime

A 3NF violato

i

i

R

X A X R

A

F

r has components:

non-prime , non-trivial ( ) ,

( ; ), that is, is not a superkey for

The decomposition satisfies 3NF if each component satisfies 3NF

i i

A X A

R X X R

F

F

Page 37: Normal Forms

• BCNF => 3NF => 2NF

• 3NF decomposition forces further decomposition in previous example:

sno sname sfood

12 shark everything17 dolphin herring

tno tname tcolor tvolume

25 lagoon blue 500027 deep dive green 5000084 puddle blue 5000

• sno → sname• sno → sfood

• tno → tname• tno → tvolume

• tvolume → tcolor

• eno → fno• fno → tno• fno → sno

• fno → fname• fno → fcolor• fno → fweight

• eno → edate• eno → enote

tno tname tvolume

25 lagoon 500027 deep dive 5000084 puddle 5000

tcolor tvolume

blue 5000 green 50000

Page 38: Normal Forms

• In practice, 3NF is usually BCNF

• only exception occurs when a non-superkey determines a prime attribute

• recall earlier example:

1

Let ( , , ) ( , )

has no 3NF violators [ escapes because is prime]

( ; ) ( ) is not a BCNF violator

( ; ) ( ) is BCNF violator decompose into ( ,

R A B C AB C C A

R C A A

AB ABC R AB C

C CA R C A R A C

F

F

F 2) and ( , )

No further BCNF violators have BCNF decomposition

Conclusion: original is 3NF but not BCNF

R B C

R

Page 39: Normal Forms

• There always exists a lossless, dependency-preserving decomposition into 3NF

• Recall that the BCNF decomposition may not be dependency-preserving

1 2

3NF algorithm:

decomposition 3NFdecomp (attributeSet , FDSet ) {

partition into subsets , , ,

by grouping FDs that have the same left side;

for 1 , let be the at

m

i

R

m

i m R

F

F F F F

0

0 1 2

tributes mentioned in ;

let be an arbitrarily chosen key for ;

return ( , , , , );

i

m

R R

R R R R

F

• For proof of algorithm's validity, see text page 788.

Page 40: Normal Forms

• Cannot always achieve dependency-preserving BCNF; may have to settlefor 3NF and some redundancy

• excessive decomposition is possible, e.g., decompose species into two tables: (sno, sname) and (sno, sfood)

− general idea is that each table represent one distinct application entity plus additional decomposition necessary to accommodate non-relationship constraints, such as tvolume --> tcolor

• there remain redundancies that persist through BCNF, e.g., fish weight in a given tank must sum to 1000 pounds

− can predict weight of last fish in the tank ==> redundancy− constraint is not an FD ==> redundancy is not removed by decomposition

• FD analysis misses multivalued dependencies and join dependencies

Some limitations of functional dependency analysis:

e.g.,

1 1 1 2 1 1

2 2 2 1 2 2

A B C

a b c a b ca b c a b c