Normal Forms

Normal Forms

First Normal Form:

• all table cells must contain atomic values

− no sets, arrays, lists, or other collection types

− no structured objects

• all relational databases satisfy first normal form by definition

− the definition restricts attribute values to singletons from specified domains

• indeed, relations that do not satisfy first normal form arespecifically called unnormalized relations

Definitions:

denotes a relation; ( ) is the corresponding body

is a functional dependency

The universal relation

contains all application attributes

( ) contains join of all database tables

R r R

X Y

R

r R

( ) is valid if it satisfies all FDs implied by the

application constraints

r R

Goal of functional dependency analysis is to decompose theuniversal relation into components which cannot becomeinconsistent with respect to any functional dependencyconstraint

Some requirements:

1

2

1 2

1

2

1 2

Suppose

For any valid ( ), must have

( ) ( ( ))

( ) ( ( ))

( ) ( ) ( )

R

R

R R R

r R

r R r R

r R r R

r R r R r R

+ indicates union of attributes

lossless join

1 2Note: [ ( ( ))] [ ( ( ))] ( )

Perhaps join contains too much ....

spurious tuples ...

R Rr r R r r R r R

A B C

1 2 35 2 4

A B

1 25 2

B C

2 32 4

A B C

1 2 31 2 45 2 35 2 4

A,BB,C

spurious tuples

Lossy join:

More definitions:

1 2

1 2

{ | ( ) is imposed by application}

{ | application FDs imply }

is a decomposition

( ) applies to if

is a valid decomposition if

(1) it i

m

i i

m

X Y X Y

X Y X Y

R R R R

X Y R X Y R

R R R R

F

F

F

s lossless

(2) if ( ) satisfies and ( ) applies to ,

then ( ) satisfies i

i

r R X Y R

r R X Y

F F

Whence functional dependencies:

Analysis of application

Closure algorithm

Minimal cover algorithm '

F

F

F

F

F

'F

Trick: compute minimal cover directly from application FDs without calculating closure

Yet more definitions:

+

Let be the universal relation

is a set of functional dependencies

We say that implies ( ) if the following condition holds:

( ) satisfies ( ) satisfies ( )

{ | implies (

R

X Y

r R r R X Y

X Y X

F

F

F

F F

1 2

1 2 1 2 1 2

)} is the closure of

Let , be sets of functional dependencies

, are equivalent, written , if

Y

F

F F

F F F F F F

2 1 2 1

2

2 2 2

2

is a minimal cover of if and

The right side of every FD in contains a single attribute

If ( ) , then [ { }]

That is, no FD in is redundant

I

X A X A

F F F F

F

F F F

F

2 2 2

2

f ( ) , then [ ( { }) { }]

That is, no FD in has a redundant attribute on its left side

is minimal if it is a minimal cover of itself

XA B XA B X B

F F F

F

F

Yet more definitions (continued):

1. Reflexivity: If , then ( ) (regardless of )

2. Augmentation: If ( ) and , then ( )

3. Transitivity: If ( ) and ( ) , then ( )

4. Composition: If ( ) and (

Y X R X Y

X Y Z R XZ YZ

X Y Y Z X Z

X Y X

F F

F F

F F F

F ) , then ( )

5. Decomposition: If ( ) , then ( ) and ( )

6. Pseudotransitivity: If ( ) and ( ) , then ( )

Z X YZ

X YZ X Y X Z

X Y WY Z WX Z

F F

F F F

F F F

Armstrong's Axioms

Theorem: Armstrong's Axioms are both sound and complete.

That is, if you apply them repeatedly to a given set of FDs, then

(1) every new FD generated belongs to the closure (sound)

(2) every FD in the closure will eventually be generated (complete)

Definition:

( ; ) { | ( ) follows from via Armstrong's Axioms}X A X A F F

Because inferences via Armstrong's Axioms yields precisely theFD set closure, this definition is the same as:

( ; ) { | ( ) }X A X A F F

Generating the closure is computationally expensive because of itsexponential size

But, checking a given FD for inclusion in the closure is easy:

( ) iff ( ; )X Y Y X F F

Algorithm uses only the application-driven FDs, not the closure

attributeSet maxRight (attributeSet , FDSet ) {

attributeSet ;

boolean changed = true;

while (changed)

if [ ( ) with and ]

;

else

X

Z X

U V U Z V Z

Z Z V

F

F

changed = false;

return ;

}

Z

attributeSet FDinClosure (attributeSet , attributeSet , FDSet ) {

/* returns true if ( ) */

return [ maxRight( , )];

}

X Y

X Y

Y X

F

F

F

Other uses of the algorithm: key determination

is a superkey for iff ( ) for every

is a key for iff is a superkey and for all ,

( ) is not a superkey

Algorithm: For each subset , use maxRight procedure to

con

X R X A A R

X R X B X

X B

X R

F

struct ( ; ). If ( ; ) , then is a superkey; otherwise,

it is not. A superkey that does not properly contain another superkey

is a key.

X X R XF F

Other uses of the algorithm: equivalence of two sets of FDs

1 2 1 2 2 1

1

2 2

iff and

Algorithm: for each ( ) , use maxRight to

determine if ( ) . Similarly, for FDs in .

X Y

X Y

2

F F F F F F

F

F F

Other uses of the algorithm: minimal cover computation:

1

mutate application-driven into an equivalent FD set

that satisfies the minimality conditions.

Algorithm:

FDSet minCover (FDSet ) {

FDSet = ;

boolean change

F

F

F

1 1

2 = true, change3 = true;

for each ( )

for each

= { };

while (change2 || change3) {

change2 = false;

while

X Y

A Y

X A

F

F F

1 1 1

1 1

1 1 1

[ ( ) with { }]{

change2 = true;

{ };

}

change3 = false;

while [ ( ) with ( {

X A X A

X A

XB A XB

F F F

F =F

F F F

1 1

1

}) { }]{

change3 = true;

( { }) { };

}

return ;

}

A X A

XB A X A

F F

F

Boyce-Codd Normal Form (BCNF):

1 2

An FD ( ) is called trivial if

Let be a lossless decomposition with respect to

satisfies BCNF if, for every nontrivial FD ( ) that applies to ,

is a superkey o

n

i i

X Y Y X

R R R R

R X A R

X

F

F

f

The decomposition satisfies BCNF if every component satisfies BCNF

There exists an algorithm to decompose any universal relation and an

associated application-driven set of FDs (or a

i

i

R

R

R

F n equivalent minimal

cover) into BCNF

The algorithmic decomposition is lossless, but unfortunately, it is not always

dependency-preserving

Dependency preservation:

1 2

1

1

Let be a lossless decomposition with respect to

Define ( ) { | ( ) and ( ) applies to }

Decomposition is dependency-preserving if ( )

Certainly, ( )

n

i i

n

ii

n

ii

R R R R

X Y X Y X Y R

F

F F

F F

F +, but is potentially larger (stronger)

F F

Example: BCNF decomposition process

:

A D C B

A C EF H

EF J EF L

FL G HJ K

F:

: R A B C D E F G H J K L:

Compute ( ; ) :

( ; ) is not a key ( ), ( ), ( ) are all

BCNF violators

Decompose: ( ; ) and [ ( ; )] { }

is a key of ( ; ) join is lossless

A A AD ADC ADCB R

A A C A D A B

A R A A

A A

F

F

F F

F

Example: BCNF decomposition process

:

A D C B

A C EF H

EF J EF L

FL G HJ K

F:


1

2

:

:

R A B C D

R A E F G H J K L

:

no longer BCNF violators A is a superkey of first relation FDs no longer apply to the second

But, still have a violator here

:

A D C B

A C EF H

EF J EF L

FL G HJ K

F:


1

11 12

:

: :

R A B C D

R B C R A C D

: 2: R A E F G H J K L

2( ; )EF EFHJLGK R F

BCNF violators

:

A D C B

A C EF H

EF J EF L

FL G HJ K

F:


1

11 12

:

: :

R A B C D

R B C R A C D

:

2

21 22

:

: :

R A E F G H J K L

R A E F R E F G H J K L

22( ; )FL FLG R FBCNF violator

221

222

:

:

R F G L

R E F H J K L

:

A D C B

A C EF H

EF J EF L

FL G HJ K

F:


1

11 12

:

: :

R A B C D

R B C R A C D

:

2

21 22

:

: :

R A E F G H J K L

R A E F R E F G H J K L

222( ; )HJ HJK R F

221

222

:

:

R F G L

R E F H J K L

2221

2222

:

:

R H J K

R E F H J L

BCNF violator

Aquarium database --- minimal cover

• sno → sname• sno → sfood

• tno → tname• tno → tvolume

• fno → fname• fno → fcolor• fno → fweight

• eno → edate• eno → enote

• tvolume → tcolor

• eno → fno• fno → tno• fno → sno

Only superkeys: supersets of eno ==> all FDs are BCNF violators except thosewith eno on left side

BCNF:

sno sname sfood tno tname tvolume

tvolume tcolor

fno fname fcolor fweight sno tno

eno edate enote fno

BCNF lossless decomposition algorithm:

/* start with universal relation; decompose BCNF violating components */

while ( ) applies to a component and ( ; )

decompose into ( ; ) and [ (i i

i

X Y R X R

R X R X

F F

F ; )] { }XF

Note: it suffices to look for BCNF violators in minimal cover of original application-driven FDs

In applying Armstrong's axioms to generate FD closure, only augmentation and pseudotransitivitygenerate new left hand sides; none generate any new right-side attributes

Augmentation: If ( ) and , then ( )

But, if ( ; ) , then because ( ; ) ( ; ), we have ( ; )

That is, if is a BCNF violator, so is

Pseudotransitivity: If ( )

i

X Y Z R XZ YZ

XZ R X XZ X R

XZ YZ X Y

X Y

F F

F F F F

F and ( ) , then ( )

But, if ( ; ) , then because ( ; ) ( ; ), we have ( ; )

That is, if is a BCNF violator, so is i

WY Z WX Z

WX R X WX X R

WX Z X Y

F F

F F F F

BCNF decomposition is lossless, but not always dependency-preserving

1 2

1

Let be a decomposition with respect to

Define ( ) { | ( ) and ( ) applies to }

Decomposition is dependency-preserving if ( )

n

i i

n

ii

R R R R

X Y X Y X Y R

F

F F

F F

Example:

1 2

Let ( , , ) ( , )

( ; ) ( ) is not a BCNF violator

( ; ) ( ) is BCNF violator decompose into ( , ) and ( , )

No further BCNF violators have BCNF d

R A B C AB C C A

AB ABC R AB C

C CA R C A R A C R B C

F

F

F

ecomposition

Example (continued):

+ 1

1

+ 2

Only possible non-trivial entries from in ( ) are ( ) or ( )

But, ( ; ) { }, so ( ; ) ( )

So, ( ) { } plus trivial FDs

Only possible non-trivial entries from in (

A C C A

A A C A A C

C A

F F

F F F

F

F

2

1 2

) are ( ) or ( )

But, ( ; ) { }, and ( ; ) { } ( ) and ( )

So, ( ) contains only trivial FDs

Consequently, , but ( ) ( ) ( )

So, decomposition is not d

B C C B

B B C C B C C B

AB C AB C

F

F F F F

F

F F F F

ependency-preserving

In general:

• if decomposition does NOT split any FD from the minimal cover, then decomposition IS dependency-preserving

• if decomposition does split an FD from the minimal cover, thenit may or may not be dependency-preserving(text presents an algorithm for deciding)

Second Normal Form:

1 2 Let be a decomposition

Let be a set of application-driven constraints (or minimal cover)

is prime if is contained in a key for

satisfies Second Normal Form (2NF) if

n

i i

i

R R R R

A R A R

R

F

(1) satisfies 1NF, and

(2) non-prime and a key of (i.e., ( ; ))

( ) for every ,

A 2NF violator has components: non-prime , key ,

i

i i

R

A X R R X

Y A Y X Y X

A X Y X

F

F

such that and ( )

The decomposition satisfies 2NF if each component satisfies 2NF

Y X Y A

F

Note: BCNF implies 2NF

Suppose have 2NF violator in BCNF decomposition

That is, have non-prime , key , such that and ( )

By BCNF criterion, is a superkey key is not a key,

A X Y X Y X Y A

Y Z Y X X

F

a contradiction.

Example:

sno sname sfood tno tname tcolor tvolume

• only key is (sno, tno), but (sno → sname) and sname is non-prime

• have several 2NF violators

• signals mixing of application entities in a single table

• decompose into:


Note: (1) tables with single-attribute keys cannot violate 2NF (2) tvolume --> tcolor is not a 2NF violator because tvolume is not part of a key







Storage anomalies (irregularities):

• classify negative consequences of 2NF violators

• insertion anomaly− relation with a 2NF violator mixes two or more application entities− can't insert an instance of just one of them− e.g., in previous example, want to insert a dolphin species that is

as yet unassociated with any tank ...


12 shark everything 25 lagoon blue 5000 12 shark everything 27 deep dive green 50000 ...... ......

17 dolphin herring ---- ------ ----- ------

insertion forces nulls into inappropriate attributes, such as part of the key (sno, tno)


• deletion anomaly− relation with a 2NF violator mixes two or more application entities− deleting last tank associated with shark species removes all information

about the species

• update anomaly− change sfood attribute of a shark => update several tuples− invites inconsistency


12 shark everything 25 lagoon blue 5000 12 shark everything 27 deep dive green 50000 ...... ......

• 2NF decomposition removes most of anomalies:

sno sname sfood

12 shark everything17 dolphin herring

tno tname tcolor tvolume

25 lagoon blue 500027 deep dive green 50000


• there remains the negative effect of tvolume --> tcolor− if change all 5000 volume tanks to purple, must update several tuples− still invites inconsistency

• in general, anomalies arise when the contents of some cells can predict thecontent of others

sno sname sfood



25 lagoon blue 500027 deep dive green 5000084 puddle blue 5000







predictable entry

Third Normal Form:

1 2 Let be a decomposition

Let be a set of application-driven constraints (or minimal cover)

is prime if is contained in a key for

satisfies Third Normal Form (3NF) if

n

i i

i

R R R R

A R A R

R

F

(1) satisfies 1NF, and

(2) non-trivial ( ) implies either (1) is a superkey for

or (2) is prime

A 3NF violato

i

i

R

X A X R

A

F

r has components:

non-prime , non-trivial ( ) ,

( ; ), that is, is not a superkey for

The decomposition satisfies 3NF if each component satisfies 3NF

i i

A X A

R X X R

F

F

• BCNF => 3NF => 2NF

• 3NF decomposition forces further decomposition in previous example:

sno sname sfood



25 lagoon blue 500027 deep dive green 5000084 puddle blue 5000







tno tname tvolume

25 lagoon 500027 deep dive 5000084 puddle 5000

tcolor tvolume

blue 5000 green 50000

• In practice, 3NF is usually BCNF

• only exception occurs when a non-superkey determines a prime attribute

• recall earlier example:

1

Let ( , , ) ( , )

has no 3NF violators [ escapes because is prime]

( ; ) ( ) is not a BCNF violator

( ; ) ( ) is BCNF violator decompose into ( ,

R A B C AB C C A

R C A A

AB ABC R AB C

C CA R C A R A C

F

F

F 2) and ( , )

No further BCNF violators have BCNF decomposition

Conclusion: original is 3NF but not BCNF

R B C

R

• There always exists a lossless, dependency-preserving decomposition into 3NF

• Recall that the BCNF decomposition may not be dependency-preserving

1 2

3NF algorithm:

decomposition 3NFdecomp (attributeSet , FDSet ) {

partition into subsets , , ,

by grouping FDs that have the same left side;

for 1 , let be the at

m

i

R

m

i m R

F

F F F F

0

0 1 2

tributes mentioned in ;

let be an arbitrarily chosen key for ;

return ( , , , , );

i

m

R R

R R R R

F

• For proof of algorithm's validity, see text page 788.

• Cannot always achieve dependency-preserving BCNF; may have to settlefor 3NF and some redundancy

• excessive decomposition is possible, e.g., decompose species into two tables: (sno, sname) and (sno, sfood)

− general idea is that each table represent one distinct application entity plus additional decomposition necessary to accommodate non-relationship constraints, such as tvolume --> tcolor

• there remain redundancies that persist through BCNF, e.g., fish weight in a given tank must sum to 1000 pounds

− can predict weight of last fish in the tank ==> redundancy− constraint is not an FD ==> redundancy is not removed by decomposition

• FD analysis misses multivalued dependencies and join dependencies

Some limitations of functional dependency analysis:

e.g.,

1 1 1 2 1 1

2 2 2 1 2 2

A B C

a b c a b ca b c a b c

Normal Forms

Documents