Top Banner
Design Theory for RDB Normal Forms
51

Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1 … What’s Bad Design? Redundancy.

Jan 21, 2016

Download

Documents

Lester Paul
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Design Theory for RDB

Normal Forms

Page 2: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Lu Chaojun, SJTU 2

Redundant because these info may be figured out by using FD s1 …

What’s Bad Design?

• Redundancy– A fact is repeated in more than one tuple.

– Eg. We put course information into Students to represent “take-course” relationship

StuCourse(sno, name, age, dept, cno, title, credit)

sno name age dept cno title credit

s1 zhao 20 CS c1 DB 3

s1 zhao 20 CS c2 OS 3

s2 qian 21 CS c2 OS 3

Page 3: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Lu Chaojun, SJTU 3

What’s Bad Design?(cont.)

• Anomalies– Update anomalieseg. When ‘zhao’ gets one year older, we may change his

age in one tuple and leave others unchanged

– Deletion anomalieseg. If ‘zhao’ is the only student taking ‘c1’ and then he

quits, we lose information of ‘c1’.

– Insertion anomalieseg. Can we input a student who has not yet selected any

course?

Page 4: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Lu Chaojun, SJTU 4

What’s Good Design?

• Decompose into smaller relations– eg. S, SC, C

• No loss of information• No redundancy• No anomalies

– Update anomalieseg.– Deletion anomalieseg.– Insertion anomalies

eg.

Page 5: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Lu Chaojun, SJTU 5

Decomposing Relations

• Goal: decompose a relation into smaller ones in order to eliminate anomalies.

• Def: Decompose R(A1,…,An) into S(B1,…,Bm) and T(C1,…,Ck) such that

1. {A1,…,An}={B1,…,Bm}{C1,…,Ck}

2. S = B1,…,Bm(R)

3. T = C1,…,Ck(R)

Page 6: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Lu Chaojun, SJTU 6

Example

Stud(sno,name,age,dept,cno)

S(sno,name,age,dept)

SC(sno,cno)

• Does the number of tuples change after decomposition?

Page 7: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Lu Chaojun, SJTU 7

Boyce-Codd Normal Form

• Goal: Defines conditions for good schemas -- Intuitively, key determines everything.

• Def.: R is in BCNF iff for every nontrivial FD X Y, X is a superkey for R.

• BCNF violation: nontrivial FD X Y where X is not a superkey

• Example: StuCourse(sno,name,age,dept,cno,title,credit) is

not in BCNF, because of FD sno name,age,dept

Page 8: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Lu Chaojun, SJTU 8

Decomposition into BCNF

• Any relation schema R can be decomposed into R1,…,Rn such that1. Each Ri is in BCNF;

2. R can be reconstructed from R1,…,Rn.

• Decomposition into BCNF Strategy– Find a BCNF-violation: X Y

– Compute X+ to augment the RHS

– Decompose R into R1 : X+ and R2 : (R–X+)X or: R–(X+–X)

X X+R–X+R X+–X

Page 9: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Lu Chaojun, SJTU 9

Decomposition into BCNF(cont.)

• Repeat the decomposition strategy if any Ri is not in BCNF, until all relations are in BCNF.– Use FD’s projected on Ri

• Always successful? -- yes!– Decomposition always yields smaller relation schemas

– Any two-attributes relation is in BCNF.

• Given R and set F of FD’s on R, we need only look among F for a BCNF violation, not those that follow from F.

Page 10: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Lu Chaojun, SJTU 10

Example

StuCourse(sno,name,age,dept,cno,title,credit)BCNF violation: sno dept

R1(sno,name,age,dept) ---- in BCNFR2(sno,cno,title,credit) -----not in BCNF

BCNF violation on R2: cno titleR21(cno,title,credit) ---- in BCNFR22(sno,cno) ---- in BCNF

– Thus StuCourse is decomposed into R1, R21, and R22.Exactly what constitutes our running DB exampleEach Ri is about one thing!

Page 11: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

More on BCNF-Algorithm

• What if not expanding the RHS of BCNF violation?– See Ex.3.3.2

• Which of several BCNF violations to use?– See Ex.3.3.3

Lu Chaojun, SJTU 11

Page 12: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Issues about Decomposition

• Elimination of redundancy and anomaly

• Recoverability of information

• Preservation of Dependency

Lu Chaojun, SJTU 12

Page 13: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Lu Chaojun, SJTU 13

Lossless Join Decomposition

• A decomposition has a lossless join if the projections of tuples can be joined again to produce all and only the original tuples.

• Example

R(A,B,C) R1(A,B) R2(B,C)

a b c a b b c(a,b) joins with (b,c) to recover (a,b,c)

Page 14: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Lu Chaojun, SJTU 14

Lossless Join Decomposition (cont.)

• Projection/Join can always recover original tuples, but the process may produce “too much” tuples.

• Example

R(A,B,C) R1(A,B) R2(B,C)

a b c a b b c

d b e d b b e(a,b) joins with (b,e) to give (a,b,e)R

Page 15: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Lu Chaojun, SJTU 15

Lossless Join Decomposition (cont.)

• Decomposition into BCNF Strategy has a lossless join, i.e. the original relation can be recovered exactly by natural join.

• Why? -- decompose according to FD BC R(A,B,C) R1(A,B) R2(B,C) a b c a b b c d b e d b b e

– c must be the same as e!

• Same is true for recursive decomposition– is associative and commutative

Page 16: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Testing for a Lossless Join

• If we project R onto R1, R2,…, Rk , can we recover R by rejoining?

• Any tuple in R can be recovered from its projected fragments.

• So the only question is: when we rejoin, do we ever get back something we didn’t have originally?

Lu Chaojun, SJTU 16

Page 17: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

The Chase Test

• Suppose tuple t comes back in the join.

• Then t is the join of projections of some tuples of R, one for each Ri of the decomposition.

• Can we use the given FD’s to show that one of the tuples of R must be t ?

Lu Chaojun, SJTU 17

Page 18: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

The Chase Test (cont.)

• Start by assuming t = abc… .

• For each i, there is a tuple si of R that has a, b, c,… in the attributes of Ri.

• si can have any values in other attributes.

• We’ll use the same letter as in t, but with a subscript, for these components.

Lu Chaojun, SJTU 18

Page 19: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Example: The Chase

• Let R = ABCD, and the decomposition be AB, BC, and CD.

• Let the given FD’s be C D and B A.

• Suppose the tuple t = abcd is the join of tuples projected onto AB, BC, CD.

Lu Chaojun, SJTU 19

Page 20: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Example: The Tableau

A B C D

a b c1 d1

a2 b c d2

a3 b3 c d

Lu Chaojun, SJTU 20

We’ve proved thesecond tuple must be t.

The tuples of R pro-jected onto AB, BC, CD

d

Use C D

a

Use B A

Page 21: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Summary of the Chase

• If two rows agree in the left side of a FD, make their right sides agree too.– Always replace a subscripted symbol by the

corresponding unsubscripted one, if possible.

• If we ever get an unsubscripted row, we know any tuple in the project-join is in the original.– the join is lossless.

• Otherwise, the join is not lossless.– The final tableau is a counterexample.

– It’s an instance of R that satisfies the given FD’s

– The join produces an unsubscripted tuple not in R

Lu Chaojun, SJTU 21

Page 22: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Example: Lossy Join

• Same relation R = ABCD and same decomposition.

• But with only the FD C D.

Lu Chaojun, SJTU 22

Page 23: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Example: The Tableau

A B C D

a b c1 d1

a2 b c d2

a3 b3 c d

Lu Chaojun, SJTU 23

d

Use C DThese three tuples are an exampleR that shows the join lossy. abcdis not in R, but we can project andrejoin to get abcd.

These projectionsrejoin to formabcd.

Page 24: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Lu Chaojun, SJTU 24

A Problem with BCNF

• A kind of FD causes problems:– If you decompose, you can’t check the FD

within a single relation– If you don’t decompose, you violate BCNF.

• An abstract example: AB C and C B– Keys: {A,B} and {A,C}– BCNF violation: CB– Decomposition: BC and AC– You can’t check FD ABC

Page 25: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Lu Chaojun, SJTU 25

Example

STC(stud,course,teacher)stud course teacher and teacher course

Key: (stud,course) and (stud,teacher)

BCNF violation: teacher course

Decomposition: TC(teacher,course), ST(stud,teacher)

Problem: stud course teacher may not be satisfied

course teacher stud teacher stud course teacher

c1 t1 s1 t1 s1 c1 t1

c1 t2 s1 t2 s1 c1 t2– Although no FD’s were violated in TC and ST, FD stud course

teacher is violated by the database as a whole.

Page 26: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Lu Chaojun, SJTU 26

3NF

• A relation R is in 3NF iff for every nontrivial FD X Y, either1. X is a superkey, or

2. Each AYX is contained in some key.

• A is said to be prime if it is a member of some key.

• We don’t decompose into BCNF in this situation, at the price of some redundancy.

Page 27: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Example: 3NF

• In our problem situation with FD’s AB C and C B– Keys: {A,B} and {A,C}

• Thus A, B, and C are each prime.

• Although CB violates BCNF, it does not violate 3NF.

Lu Chaojun, SJTU 27

Page 28: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Lu Chaojun, SJTU 28

3NF vs BCNF

• There are two important properties of a decomposition:– P1 (Lossless Join). We are able to recover from

the decomposed relations the data of the original.

– P2 (Dependency Preservation). We are able to check that the FD's for the original relation are satisfied by checking the projections of those FD's in the decomposed relations.

Page 29: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Lu Chaojun, SJTU 29

3NF vs BCNF(cont.)

• It is always possible to decompose into BCNF and satisfy P1.

• It is always decompose into 3NF and satisfy both P1 and P2.

• It is not always possible to decompose into BNCF and satisfy both P1 and P2.

Page 30: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Lu Chaojun, SJTU 30

Why no 1NF and 2NF?

• 1NF– atomic value for any attribute

• 2NF– 1NF and there’s no partial dependency

• 3NF– 2NF and there’s no transitive dependency

Page 31: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

3NF Synthesis Algorithm

• We can always decompose a relation into 3NF relations with a lossless join and dependency preservation.

• Need minimal basis for the FD’s:1. Right sides are single attributes.

2. No FD can be removed.

3. No attribute can be removed from a left side.

Lu Chaojun, SJTU 31

Page 32: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

3NF Synthesis Algorithm(cont.)

• One relation for each FD in the minimal basis.– For XA, create T(X,A).

• If none of the relation schemas contains some key for R, then add one relation whose schema is some key.

Lu Chaojun, SJTU 32

Page 33: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Example: 3NF Synthesis

• Relation R(A,B,C,D).

• FD’s: AB and AC.

• Decomposition: AB and AC from the FD’s, plus AD for a key.

Lu Chaojun, SJTU 33

Page 34: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Why It Works

• Lossless Join: use the chase to show that the row for the relation that contains a key can be made all-unsubscripted variables.

• Preserves dependencies: each FD from a minimal basis is contained in a relation, thus preserved.

• 3NF: hard part – a property of minimal bases.

Lu Chaojun, SJTU 34

Page 35: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Lu Chaojun, SJTU 35

MVD: Attribute Independence

CTX: course teacher text DB Li T1 Lu T2 T3 DB Li T1 DB Li T2 DB Li T3 DB Lu T1 DB Lu T2 DB Lu T3

CTX is in BCNF!

Page 36: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Lu Chaojun, SJTU 36

MVD

• A multivalued dependency XY holds for R if whenever two tuples of R agree on X, then we can swap their Y components and get two new tuples in R.

X Y Z

x1 y1 z1

x1 y2 z2

x1 y2 z1

x1 y1 z2– For any fixed X, the associated values of Y and Z appear in all

possible combinations. Or, Y and Z are independent.

Page 37: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Lu Chaojun, SJTU 37

Reasoning about MVD

• Trivial MVD’sX Y if YXX Y if R = XY.– Nontrivial MVD: X Y where attributes of Y

don’t appear in X and XY are not all the attributes of R.

• Transitive rule If XY and YZ, then XZ

– Any attribute in XZ must be deleted from Z.

Page 38: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Lu Chaojun, SJTU 38

Reasoning about MVD(cont.)

• FD Promotion

If XY, then XY.

• Complementation Rule

If XY, then XZ, where Z is all attributes not in X and Y.– Sometimes written as X Y | Z

• No splitting rule!– Eg. name city street | title year

Page 39: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Lu Chaojun, SJTU 39

4NF

• Goal: eliminate the redundancy caused by MVD

• R is in 4NF iff for every nontrivial MVD XY, X is a superkey.– If so, every nontrivial MVD is really an FD.– 4NF implies BCNF, because FD is also an

MVD and BCNF violation is also 4NF violation.

– Eg. CTX: C T and C is not a superkey.

Page 40: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Lu Chaojun, SJTU 40

Decomposition into 4NF

Algorithm: Given R and FD/MVD,

1. Find a 4NF violation: XY.– If no, then R is in 4NF.

2. Decompose R into R1(X,Y) and R2(X,Z) where Z = R (XY)

3. Find FD/MVD on R1 and R2. Recursively decompose R1 and R2.

Page 41: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Lu Chaojun, SJTU 41

Example 1

CTX(course,teacher,text)

1. courseteacher

2. CT(course,teacher) and CX(course,text)

3. No nontrivial MVD any more. So CT and CX are in 4NF.

Page 42: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Lu Chaojun, SJTU 42

Example 2

Person(name,addr,phones,hobbies) FD: nameaddrNontrivial MVD: namephones and

namehobbiesOnly key: {name,phones,hobbies}All three dependencies violate 4NFSuccessive decomposition yields 4NF relations:

P1(name,addr)P2(name,phones)P3(name,hobbies)

Page 43: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Lu Chaojun, SJTU 43

Relationships Among NF

• 4NF BCNF 3NF 2NF 1NF

3NF BCNF 4NF

Eliminates redundancy due to FD

Most Yes Yes

Eliminates redundancy due to MVD

No No Yes

Preserve FD Yes Maybe Maybe

Preserve MVD Maybe Maybe Maybe

Page 44: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Reasoning about FD/MVD’s

• Review: closure algorithm for inferring FD

• Closure algorithm can be seen as a variant of the Chase.

• The Chase can be extended to incorporate MVD’s as well as FD’s.– Inferring MVD’s– Projecting MVD’s

Page 45: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Inferring FD using the Chase

• Chase test for “X Y follows from F”– Start with a tableau having two rows that agree

only on X– Chase the tableau using FD’s of F to equate

columns in X+ X – If the final tableau agrees in Y, then X Y

holds; otherwise, it does not.

Page 46: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Inferring MVD using the Chase

• FD XY can be used to equate values of Y for two tuples that agree on X.

• MVD XY can be used to form new tuples by swapping Y for two tuples that agree on X

• Given a set of FD/MVD’s, infer XY.– Start with two tuples s and t that agree only on X;

– Apply FD and MVD;

– If we find s[Yt[Y]] in the tableau, then we have inferred XY.

Page 47: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Problem and Solution

• Since symbols may get equated and replaced, we may not recognize the desired tuple.

• Solution:– Define a target row with all unsubscripted letters, and

never change its symbols.– Let s[X], s[Y], t[X] and t[Z] have unsubscripted letters.

All the other components of s and t have unique new symbols.

– Apply the chase.– If all-unsubscripted-letters row appears in the tableau,

then we have inferred the MVD.

Page 48: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Example

• Given R(A,B,C,D) with AB, BC. Prove ACA B C D A B C D A B C D

a b1 c d1 a b c d1 a b c d1

a b c2 d a b c2 d a b c2 d

a b c2 d1

a b c d– Target row is (a,b,c,d)

Page 49: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Why Chase Works for MVD?

• A positive conclusion of the chase is nothing but another form of the familiar proof that the concluded FD/MVD holds.

• When the chase ends in failure, the final tableau is a counterexample.

• The chase can’t possibly keep producing new rows forever, since it never create new symbols.

Page 50: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

Projecting MVD’s

• If R is decomposed into Ri’s, we have to test every possible FD and MVD for each Ri using the chase.– The chase is applied on R, but we only need to produce

a row that has unsubscripted letters in all the attributes of Ri.

• Often, we don’t have to be exhaustive:– Check no trivial FD/MVD;

– Consider only FD with singleton RHS;

– Don’t consider FD/MVD whose LHS doesn’t contain the LHS of any given FD/MVD.

Page 51: Design Theory for RDB Normal Forms. Lu Chaojun, SJTU 2 Redundant because these info may be figured out by using FD s1  … What’s Bad Design? Redundancy.

End