Top Banner
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See www.db-book.com for conditions on re-use Chapter 8: Top - down Relational Database Design: NORMALIZATION
120

Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

Jul 08, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

Database System Concepts, 6th Ed.

©Silberschatz, Korth and Sudarshan

See www.db-book.com for conditions on re-use

Chapter 8: Top-down Relational

Database Design: NORMALIZATION

Page 2: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.2Database System Concepts - 6th Edition

What can happen when we combine

relations/tables?

Suppose we combine the tables

instructor and department

This creates redundancy (repetition of

information):

Page 3: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.3Database System Concepts - 6th Edition

What can happen when we combine

relations/tables?

Another problem: UPDATE

When any of the redundant info is changed, the changes

hav eto be applied to multiple tuples!

Page 4: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.4Database System Concepts - 6th Edition

What can happen when we combine

relations/tables?

Another problem: INSERTION

When a new department is created, there are no

instructors associated with it yet → need to use NULL

values!

Page 5: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.5Database System Concepts - 6th Edition

Is there any reason to combine

relations/tables?

Any query that involves a natural join

between department and instructor will

execute faster on the combined table!

This is generally preferred in data mining.

Page 6: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.6Database System Concepts - 6th Edition

The “top-down” approach

In this chapter, we look at the problem in the

opposite direction:

Starting with a large table that contains many

columns and much redundant information, how can

we split (decompose) it into tables with fewer

columns and less redundancy?

Page 7: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.7Database System Concepts - 6th Edition

Suppose we start with the table inst_dept. How would we

get the idea to decompose it into instructor and

department?

Naïve approach: spot redundancies in data … but it

doesn’t work for two reasons!

Top-down: Decomposition

Page 8: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.8Database System Concepts - 6th Edition

Problem 1: It’s costly

▪ Real-life DBs can have large amount of data (hundreds

of columns, hundreds of millions of rows)

▪ Spotting redundancies requires consideration of

combinations of elements from a set that is already

large →

▪ → Combinatorial explosion (N-squared and worse)

Spotting redundancies in data

Page 9: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.9Database System Concepts - 6th Edition

Problem 2: From data alone, it’s impossible to decide

whether a pattern discovered is coincidence or not

Is it the case that departments always reside in one

building and have a unique budget?

Spotting redundancies in data

Page 10: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.10Database System Concepts - 6th Edition

Solution:

Examine not the data itself (a.k.a. syntax), but the meaning

of the data, a.k.a. the semantics!

The designer must be allowed to specify rules of the

enterprise, a.k.a. functional dependencies, e.g.

dept_name → building, budget

Page 11: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.11Database System Concepts - 6th Edition

dept_name building, budget

What does it mean?

“If several rows have the same value for dept_name,

then they also have the same values for building and

budget.”

or

“If there were a schema (dept_name, building, budget),

then dept_name would be a candidate key.”

Page 12: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.12Database System Concepts - 6th Edition

Since in our table inst_dept dept_name is not a candidate key,

the building and budget of a department may have to be repeated

along with dept_name.

▪ This indicates the need to decompose inst_dept.

dept_name building, budget

“If there were a schema (dept_name, building, budget),

then dept_name would be a candidate key.”

Page 13: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.13Database System Concepts - 6th Edition

This example also shows how functional dependencies (FD) are

different from keys: a FD captures a rule that is in general more

granular than a key.

A key is a FD, but a FD is not always a key!

dept_name building, budget

“If there were a schema (dept_name, building, budget),

then dept_name would be a candidate key.”

Page 14: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.14Database System Concepts - 6th Edition

Not all decompositions are good!

Suppose we decompose

employee(ID, name, street, city, salary)

into

employee1 (ID, name)

employee2 (name, street, city, salary)

Problem: we cannot reconstruct the original employee relation!

Page 15: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.15Database System Concepts - 6th Edition

A lossy decomposition

Page 16: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.16Database System Concepts - 6th Edition

But there are also lossless decompositions!

▪ Technically it’s called a lossless-join decomposition

▪ Decomposition of R = (A, B, C) into

R1 = (A, B) and R2 = (B, C)

A B

1

2

A

B

1

2

r B,C(r)

A (r) B (r)A B

1

2

C

A

B

B

1

2

C

A

B

C

A

B

A,B(r)

Page 17: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.17Database System Concepts - 6th Edition

How to avoid lossy decompositions?

Page 18: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.18Database System Concepts - 6th Edition

Goal: Devise a theory for the following

▪ Decide whether a particular relation R is in “good” form.

▪ When the relation R is not in “good” form, decompose

it into a set of relations {R1, R2, ..., Rn} such that

• each relation is in good form

• the decomposition is a lossless-join decomposition

▪ Our theory is based on dependencies:

▪ functional dependencies

▪ multivalued dependencies

▪ The process outlined above is called NORMALIZATION

Page 19: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.19Database System Concepts - 6th Edition

8.2 First Normal Form (1NF)

A domain is atomic if its elements are treated by the

DBMS as indivisible units.

Examples of non-atomic domains:

Names with first +middle + last

IDs that can be broken up into parts (e.g. CS401)

Phone numbers

Any composite attributes!

A relational schema R is in first normal form (1NF) if

the domains of all attributes of R are atomic.

For now, we assume all relations to be in 1NF (but see

Ch.22: Object-Based Databases)

Page 20: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.20Database System Concepts - 6th Edition

1NF

Atomicity is actually a property of how the elements of the

domain are used!

Example: Students are given roll numbers which are strings

of the form CS0012 or EE1127

▪ Strings would normally be considered indivisible …

▪ … but if the first two characters are extracted to find the

dept., the domain of roll numbers is not atomic.

▪ Doing so is a bad idea: leads to encoding of information

in the app. program rather than in the DB.

Why is this bad?

What should the DB designer do in this case?

Page 21: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.21Database System Concepts - 6th Edition

8.3 Functional Dependencies (FD)

▪ FDs are constraints on the set of legal relations.

▪ Require that the value for a certain set of attributes

determine uniquely the value for another set of

attributes.

▪ A FD is a generalization of the concept of key: A key

requires that the value for a certain set of attributes

determine uniquely the value for all remaining

attributes.

Page 22: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.22Database System Concepts - 6th Edition

Functional Dependencies

Let R be a relation schema, and , two sets of attributes

R and R

The functional dependency

holds on R if and only if for any legal relations r(R), whenever any two tuples t1 and t2 of r agree on the attributes , they also agree on the attributes . That is,

t1[] = t2 [] t1[ ] = t2 [ ]

Example: Consider r (A,B ) with the following instance:

On this instance, A B does NOT hold,

but B A does hold.

A B

Page 23: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.23Database System Concepts - 6th Edition

QUIZ: Functional Dependencies

Decide if the following FDs hold or not:

a) A B

b) B A

c) {A, C} D

d) {A, B, C} D

Page 24: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.24Database System Concepts - 6th Edition

FD vs. key

▪ K is a superkey for relation schema R if and only if K R

▪ K is a candidate key for R if and only if

• K R, and

• for no K, R (minimal!)

▪ FDs allow us to express constraints that cannot be expressed

using (super)keys. Consider the schema:

inst_dept (ID, name, salary, dept_name, building, budget )

We expect these FDs to hold:

dept_name building, ID building

but would not expect the following FD to hold:

dept_name salary

Page 25: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.25Database System Concepts - 6th Edition

QUIZ: FD vs. key

Decide if the following are super/candidate keys:

a) A

b) B

c) {A, C}

d) {A, B, C}

e) {A, B, C, D}

f) D

Page 26: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.26Database System Concepts - 6th Edition

Uses for FDs

▪ Test relations to see if they are legal under a given set of

FDs.

▪ If a relation r is legal under a set F of FDs, we say that r

satisfies F.

▪ Specify constraints on the set of legal relations

▪ We say that F holds on R if all legal relations on R satisfy

the set of FDs F.

Note: A specific instance of a relation schema may satisfy a FD

even if the FD does not hold on all legal instances.

• Example: a specific instance of instructor may, by chance,

satisfy

name ID.

Page 27: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.27Database System Concepts - 6th Edition

Trivial FD

A functional dependency is trivial if it is satisfied

by all instances of a relation

Example:

ID, name ID

name name

In general, is trivial if

Page 28: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.28Database System Concepts - 6th Edition

QUIZ: Trivial FDs

Give 4 examples of trivial FDs in this relation.

Page 29: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.29Database System Concepts - 6th Edition

solution

Trivial FDs in this relation:

• A → A

• B → B

• C → C

• D → D

• AB → A

• AB → AB

• BC → C

. . . . .

• ABCD → ABCD

Page 30: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.30Database System Concepts - 6th Edition

The Holy Grail: The closure of a set of FDs

▪ Given a set F of FDs, there are certain other FDs

that are logically implied by F.

Example: If A B and B C, then we can infer that A C

▪ The set of all FDs logically implied by F is the

closure of F.

▪ We denote the closure of F by F+.

▪ F+ is a superset of F.

Page 31: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.31Database System Concepts - 6th Edition

8.3.2 Boyce-Codd Normal Form

▪ is trivial (i.e., )

▪ is a superkey for R

A relation schema R is in BCNF with respect to a set F of

FDs if, for all FDs in F+ of the form

(where R and R), at least one of the following is

true:

Page 32: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.32Database System Concepts - 6th Edition

QUIZ: BCNF

is trivial (i.e., )

is a superkey for R

at least one of the following holds:

Is this schema in BCNF?

instr_dept (ID, name, salary, dept_name, building, budget )

Page 33: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.33Database System Concepts - 6th Edition

solution

is trivial (i.e., )

is a superkey for R

at least one of the following holds:

Is this schema in BCNF?

instr_dept (ID, name, salary, dept_name, building, budget )

No, because

dept_name building, budget

holds, but dept_name is not a superkey (Why?)

Page 34: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.34Database System Concepts - 6th Edition

Extra-credit QUIZ

EOL1/3

Page 35: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.35Database System Concepts - 6th Edition

Quiz:

What does the acronym FD stand for?

What is the difference between keys and FDs?

Page 36: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.36Database System Concepts - 6th Edition

Quiz:

What does the acronym FD stand for?

• Functional Dependency

What is the difference between keys and FDs?

• A key is a FD, but a FD is not always a key!

• In a key, the LHS implies all attributes, but in a FD it

implies only some attributes.

Page 37: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.37Database System Concepts - 6th Edition

Quiz:

What is meant by saying that an attribute (or set of

attributes) implies another attribute (or set of

attributes) ?

Page 38: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.38Database System Concepts - 6th Edition

Quiz:

What is meant by saying that an attribute (or set of

attributes) implies another attribute (or set of

attributes) ?

Page 39: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.39Database System Concepts - 6th Edition

Quiz:

What does the acronym BCNF stand for?

What is the definition of a table/relation being in

BCNF?

Page 40: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.40Database System Concepts - 6th Edition

Quiz:

What does the acronym BCNF stand for?

• Boyce-Codd Normal Form

What is the definition of a table/relation being in

BCNF?

Page 41: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.41Database System Concepts - 6th Edition

QUIZ

Page 42: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.42Database System Concepts - 6th Edition

SOLUTION

No, because the FD BC violates

the BCNF definition: neither is the

FD trivial, not is B a (super)key.

The fact that B is not in general a

superkey can be proved with an

example of an instance:

Page 43: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.43Database System Concepts - 6th Edition

QUIZ

What is the FD BA is

added to the set F?

Is (R, F) now in BCNF?

Page 44: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.44Database System Concepts - 6th Edition

QUIZ

What is the FD BA is

added to the set F?

Is (R, F) now in BCNF?

A: Yes, b/s B is now a superkey.

Page 45: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.45Database System Concepts - 6th Edition

BCNF Decomposition

Suppose we have a schema R and a non-trivial dependency causes a violation of BCNF.

We decompose R into:

• ( U )

• ( R - ( - ) )

Page 46: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.46Database System Concepts - 6th Edition

Example BCNF decomposition

• ( U )

• ( R - ( - ) )

In our example:

= dept_name

= building, budget

so we decompose into:

( U ) = ( dept_name, building, budget )

( R - ( - ) ) = ( ID, name, salary, dept_name )

Page 47: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.47Database System Concepts - 6th Edition

QUIZ 1: BCNF

We decompose R into:

• ( U )

• ( R - ( - ) )

Take = {A, B, C, D} = {C, D, E, F}, and the entire relation is R = {A,B,C,D,E,F,G,H}

What is the decomposition?

Page 48: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.48Database System Concepts - 6th Edition

QUIZ 2: BCNF

We decompose R into:

• ( U )

• ( R - ( - ) )

Take = {A, B} = {E, F}, and the entire relation is R = {A,B,C,D,E,F,G,H}

What is the decomposition?

Page 49: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.49Database System Concepts - 6th Edition

QUIZ 3: BCNF

Is this relation in BCNF?

Hint: Rename the attributes A, B, C, ….

Page 50: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.50Database System Concepts - 6th Edition

QUIZ 3: BCNF

A: Not BCNF, b/c both FDs are violations!

Decompose it to BCNF!

Page 51: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.51Database System Concepts - 6th Edition

QUIZ 3: BCNF

Solution:

Page 52: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.52Database System Concepts - 6th Edition

Dependency Preservation

Constraints, including FDs, are costly to check in

practice unless they pertain to only one relation.

If it is sufficient to test only those dependencies on each

individual relation of a decomposition in order to

ensure that all functional dependencies hold, then

that decomposition is dependency preserving.

Page 53: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.53Database System Concepts - 6th Edition

BCNF and Dependency Preservation

ER model of a bank: A

customer can have

more than 1 personal

banker, but at most

one at any given

branch.

A ternary relationship-

set is needed:

Page 54: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.54Database System Concepts - 6th Edition

BCNF and Dependency Preservation

Implementation:

R = cust_banker_branch = (customer_id, employee_id,

branch_name, type)

FDs: FD1: employee_id branch_name

FD2: (customer_id, branch_name) (employee_id, type)

Is cust_banker_branch in BCNF?

Page 55: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.55Database System Concepts - 6th Edition

BCNF and Dependency Preservation

Implementation:

R = cust_banker_branch = (customer_id, employee_id,

branch_name, type)

FDs: FD1: employee_id branch_name

FD2: (customer_id, branch_name) (employee_id, type)

Apply the decomposition algorithm!

Page 56: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.56Database System Concepts - 6th Edition

BCNF and Dependency Preservation

Implementation:

R = cust_banker_branch = (customer_id, employee_id,

branch_name, type)

FDs: FD1: employee_id branch_name

FD2: (customer_id, branch_name) (employee_id, type)

Decomposition:

R1 = (employee_id, branch_name)

R2 = (customer_id, employee_id, type)

Problem: FD2 is now “spread” across

two relations!

Page 57: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.57Database System Concepts - 6th Edition

BCNF and Dependency Preservation

Conclusion:

BCNF is not dependency preserving (in

general)

Because it is not always possible to achieve both

BCNF and dependency preservation, we consider a

weaker normal form …

Page 58: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.58Database System Concepts - 6th Edition

Third Normal Form = 3NF

A relation schema R is in third normal form (3NF) if for all:

in F+

at least one of the following holds:

is trivial (i.e., )

is a superkey for R

Each attribute A in – is contained in a candidate key for R.

(NOTE: each attribute may be in a different candidate key)

If a relation is in BCNF it is in 3NF (since in BCNF one of the first two

conditions above must hold).

Third condition is a minimal relaxation of BCNF to ensure dependency

preservation.

Page 59: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.59Database System Concepts - 6th Edition

SKIP all other 3NF theory!

The only facts about 3NF we cover are

those on the previous slide!

Page 60: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.60Database System Concepts - 6th Edition

Whatever happened with 2NF?

In a nutshell, it forbids attributes to depend on parts of keys.

It is not of practical use anymore.

See Second normal form - Wikipedia, the free encyclopedia

for more details.

Page 61: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.61Database System Concepts - 6th Edition

Review of Normal Forms

Page 62: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.62Database System Concepts - 6th Edition

Updated list of Normalization Goals

Let R be a relation scheme with a set F of FDs:

▪ Decide whether R is in “good” form.

▪ If R is not in “good” form, decompose it into a set of relation

schemes {R1, R2, ..., Rn} such that :

• each relation scheme is in good form

• the decomposition is a lossless-join decomposition

• Preferably, the decomposition should be dependency

preserving.

EOL 2/3

Page 63: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.63Database System Concepts - 6th Edition

QUIZ: BCNF and 3NF

Consider the following relation:

What non-trivial FDs exist?Hint: Rename the attributes A, B, C.

Source: http://en.wikipedia.org/wiki/Boyce-Codd_normal_form

Page 64: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.64Database System Concepts - 6th Edition

QUIZ continued

F1: Person, Shop Type → Nearest Shop

F2: Nearest Shop → Shop Type

Source: http://en.wikipedia.org/wiki/Boyce-Codd_normal_form

Is the relation in BCNF?

Simplified notation:

AB → C

C → B.

A B C

Page 65: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.65Database System Concepts - 6th Edition

No, b/c C → B is a violation: C is not superkey.

Source: http://en.wikipedia.org/wiki/Boyce-Codd_normal_form

Is the relation in 3NF?

QUIZ continued

Simplified notation:

AB → C

C → B.

Page 66: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.66Database System Concepts - 6th Edition

No, b/c C → B is a violation: C is not superkey.

Source: http://en.wikipedia.org/wiki/Boyce-Codd_normal_form

Remember: 3NF has the following condition in addition to BCNF:

Each attribute A in – is contained in a candidate key for R.(NOTE: each attribute may be in a different candidate key)

QUIZ continued

Simplified notation:

AB → C

C → B.

Page 67: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.67Database System Concepts - 6th Edition

Do the BCNF decomposition

Source: http://en.wikipedia.org/wiki/Boyce-Codd_normal_form

B is part of the candidate key AB.

This shows that C → B is not a 3NF violation, so the relation is

in 3NF!

QUIZ continued

Simplified notation:

AB → C

C → B.

Page 68: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.68Database System Concepts - 6th Edition

Is this decomposition dependency-preserving?

Source: http://en.wikipedia.org/wiki/Boyce-Codd_normal_form

R1 = {B, C} R2 = {A, C}

QUIZ continued

Simplified notation:

AB → C

C → B.

Page 69: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.69Database System Concepts - 6th Edition

Is this decomposition dependency-preserving?

No, b/c AB → C is “spread” across the two relations.

Source: http://en.wikipedia.org/wiki/Boyce-Codd_normal_form

R1 = {B, C} R2 = {A, C}

QUIZ continued

Simplified notation:

AB → C

C → B.

Page 70: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.70Database System Concepts - 6th Edition

8.4 Functional-Dependency Theory

This is the formal theory that tells us which FDs are

implied logically by a given set of FDs.

▪ Given a set F of FDs, there are certain other FDs that

are logically implied by F.

E.g. transitivity: If A B and B C, then also A C

▪ The set of all functional dependencies logically

implied by F is the closure of F, denoted F +.

Page 71: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.71Database System Concepts - 6th Edition

Armstrong’s

Axioms

We can find F+, the closure of F, by repeatedly applying

Armstrong’s Axioms:

if , then (reflexivity)

if , then (augmentation)

if , and , then (transitivity)

These rules are

sound (They generate only FDs that actually hold)

complete (They generate all FDs that hold).

William Ward Armstrong is a Canadian

mathematician and computer scientist.

He presented what became known as his

axioms in a 1974 paper.

Page 72: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.72Database System Concepts - 6th Edition

Examples of use of A’s Axioms

Given the following relation: R = (A, B, C, G, H, I)

and the set of FDs F = { A B

A C

CG H

CG I

B H}

Some other members of the closure F+ are:

A H

by transitivity from A B and B H

AG I

by augmenting A C with G, to get AG CG

and then transitivity with CG I

CG HI

by augmenting CG I to infer CG CGI,

and augmenting of CG H to infer CGI HI,

and then transitivity

Page 73: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.73Database System Concepts - 6th Edition

Your turn!

• if , then (reflexivity)

• if , then (augmentation)

• if , and , then (transitivity)

Prove that

if and only if

Double implication:

L-to-R and R-to-L!

Page 74: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.74Database System Concepts - 6th Edition

Solution

• if , then (reflexivity)

• if , then (augmentation)

• if , and , then (transitivity)

Prove that

(use augmentation)

(use reflexivity and

transitivity)

Page 75: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.75Database System Concepts - 6th Edition

QUIZ: Armstrong’s Axioms

Write Armstrong’s Axioms:

• (reflexivity)

• (augmentation)

• (transitivity)

Page 76: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.76Database System Concepts - 6th Edition

Solution

Write Armstrong’s Axioms:

• if , then (reflexivity)

• if , then (augmentation)

• if , and , then (transitivity)

Page 77: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.77Database System Concepts - 6th Edition

More FD theorems, a.k.a. rules or results

Exercise 8.26

Exercise 8.4

Exercise 8.5

Page 78: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.78Database System Concepts - 6th Edition

Naïve Algorithm for Computing F+

Repeatedly many different axioms and theorems to derive new FDs!

Can you find 3 more FDs in this manner?

Do you see a problem with this approach?

Page 79: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.79Database System Concepts - 6th Edition

Extra-credit QUIZ

Page 80: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.80Database System Concepts - 6th Edition

Algorithm for Computing F+

To compute the closure F+ of a set of FDs F:

Assign F+ = F

repeat

for each FD f in F+

apply reflexivity and augmentation rules on f

add the resulting FDs to F +

for each pair of FDs f1and f2 in F +

if f1 and f2 can be combined using transitivity,

add the resulting FD to F +

until F + does not change any further

Reflexivity only to the RHS of f

Page 81: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.81Database System Concepts - 6th Edition

QUIZ: Apply the algorithm to

R = {A, B, C, D, E, F},

with F = {AB → C, B →D, CD →F}

Assign F+ = F

repeat

for each FD f in F+

apply reflexivity and augmentation rules on f

add the resulting FDs to F +

for each pair of FDs f1and f2 in F +

if f1 and f2 can be combined using transitivity,

add the resulting FD to F +

until F + does not change any further

Reflexivity only to the RHS of f

Page 82: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.82Database System Concepts - 6th Edition

Closure of Attribute Sets

Since computing the entire closure F+ is in general a

formidable task, we set ourselves first a more modest

goal:

▪ Given a set of attributes , define the closure of

under F, denoted by + : it is the set of attributes

that are functionally determined by under F

Page 83: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.83Database System Concepts - 6th Edition

Closure of Attribute Sets

This is the algorithm to compute +, the closure of

under F:

result := ;

while (changes to result) do

for each in F do

begin

if result then result := result

end

Page 84: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.84Database System Concepts - 6th Edition

Example

of

Attribute

Set

Closure

R = (A, B, C, G, H, I)

F = {A BA C CG HCG IB H}

(AG)+

1. result = AG

2. result = ABCG (A C and A B)

3. result = ABCGH (CG H and CG AGBC)

4. result = ABCGHI (CG I and CG AGBCH)

Is AG a candidate key?

1. Is AG a super key?

1. Does AG R? Yes, b/c (AG)+ = R.

2. Is any subset of AG a superkey?

1. Does A R? No, b/c (A)+ ≠ R

2. Does G R? No, b/c (G)+ ≠ R

Stop b/c

there are no

changes!

EOL 3

Page 85: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.85Database System Concepts - 6th Edition

QUIZ: 3NF

A relation schema R is in third normal form (3NF) if for all:

in F+

at least one of the following holds:

is trivial (i.e., )

is a superkey for R

Each attribute A in – is contained in a candidate key for R.

(NOTE: each attribute may be in a different candidate key)

books (Book-Name, Editor, A-Name, A-SSN, Nr-pag)

FD: A_Name A_SSN

Is it in BCNF? 3NF?

Page 86: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.86Database System Concepts - 6th Edition

Uses of Attribute Closure Algorithm

Testing for superkey:

To test if is a superkey, we compute +, and check

if + contains all attributes of R

Testing if a certain FDs holds:

To check if holds (is in F+), just check if +

Another algorithm for computing closure F+ of F:

For each R, find the closure +

for each S +, we output the FD S

Still very expensive, but at least

we have a more systematic way

of doing it!

Page 87: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.87Database System Concepts - 6th Edition

QUIZ: Uses of Attribute Closure Alg.

Testing for superkey:

To test if is a superkey, we compute +, and

check if + contains all attributes of R

R = (A, B, C, D)

F = {A BC, B C, A B, AB C, BC → D}

Is AD a superkey?

Is AD a candidate key?

Page 88: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.88Database System Concepts - 6th Edition

QUIZ: Uses of Attribute Closure Alg.

Testing if a certain FDs holds:

To check if holds (is in F+), just check if

+

R = (A, B, C, D)

F = {A BC, B C, A B, AB C, BC → D}

Does AC → D hold ?

Page 89: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.89Database System Concepts - 6th Edition

SKIP:

-- the remainder of Section 8.4

-- 8.5

Page 90: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.90Database System Concepts - 6th Edition

8.6 Multivalued dependencies

First let’s go back and cover the example from 8.3.5

Page 91: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.91Database System Concepts - 6th Edition

8.3.5 The need for normal forms

beyond BCNF

There are database schemas in BCNF that still do not seem to

be sufficiently normalized!

Example: Consider the relation

inst_info (ID, child_name, phone)

where an instructor can have multiple phone nrs. and multiple

children:

ID child_name phone

99999

99999

99999

99999

David

David

William

Willian

512-555-1234

512-555-4321

512-555-1234

512-555-4321

Page 92: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.92Database System Concepts - 6th Edition

There are no non-trivial functional dependencies and therefore

the relation is in BCNF. But we have:

▪ Redundancy

▪ Insertion anomalies: if we add a phone 981-992-3443 to

99999, we need to add two tuples:

(99999, David, 981-992-3443)

(99999, William, 981-992-3443)

ID child_name phone

99999

99999

99999

99999

David

David

William

Willian

512-555-1234

512-555-4321

512-555-1234

512-555-4321

Beyond BCNF?

Page 93: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.93Database System Concepts - 6th Edition

It is better to decompose inst_info into:

This suggests the need for higher normal forms, such as Fourth Normal Form

(4NF - later.

ID child_name

99999

99999

99999

99999

David

David

William

Willian

inst_child

ID phone

99999

99999

99999

99999

512-555-1234

512-555-4321

512-555-1234

512-555-4321

inst_phone

Beyond BCNF?

Page 94: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.94Database System Concepts - 6th Edition

Back to 8.6 Multivalued dependencies

Read the similar example

inst (ID, dept name, name, street, city)

Page 95: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.95Database System Concepts - 6th Edition

8.6.1 Multivalued dependencies

FDs rule out certain tuples from being in the relation:

if A → B, we cannot have tuples with the same A value but

different B values.

▪ For this reason, FDs are also referred to as equality-

generating dependencies

Multivalued dependencies (MVDs) do not rule out - instead,

they require that other tuples be present in the relation.

▪ MVDs are also referred to as tuple-generating

dependencies.

Page 96: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.96Database System Concepts - 6th Edition

Formal definition of MVDs

Example on next slide

Page 97: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.97Database System Concepts - 6th Edition

OK, this is not so hard after all:

It simply says that the “cross” tuples must also be present!

Page 98: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.98Database System Concepts - 6th Edition

Our text has another intuitive interpretation:

The relationship between and is independent of the

relationship between and R−.

Page 99: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.99Database System Concepts - 6th Edition

MVD example

An instructor can be associated with multiple departments and a

department may have several instructors, so:

• ID → dept_name does not hold

• dept_name → ID does not hold

Therefore, r2 has no FDs, therefore it is in BCNF.

Page 100: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.100Database System Concepts - 6th Edition

MVD example

Despite r2 being in BCNF, there is redundancy: we repeat the

address information of each residence of an instructor once for each

department with which the instructor is associated.

What MVD can you spot?

Page 101: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.101Database System Concepts - 6th Edition

solution

Actually, there are two MVDs:

It can be easily proved that

Page 102: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.102Database System Concepts - 6th Edition

Extra-credit QUIZ: MVD

Page 103: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.103Database System Concepts - 6th Edition

SKIP the remainder of Section 8.6:

-- 8.6.2 Fourth Normal Form

-- 8.6.3 4NF Decomposition

Page 104: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.104Database System Concepts - 6th Edition

8.8 Overall DB Design Process

We have assumed that the schema R is given, but how does R

appear in practice?

▪ R can be generated when converting E-R diagram to a set

of tables.

▪ R can be a single relation containing all attributes that are of

interest (called universal relation). Normalization then

breaks R into smaller relations.

or

▪ R can be the result of some ad hoc design of relations,

which we then test/convert to a normal form.

Page 105: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.105Database System Concepts - 6th Edition

ER Model and Normalization

When an E-R diagram is carefully designed, identifying all entities

correctly, the tables generated from the E-R diagram should not need

further normalization.

However, in a real, imperfect design, there can be:

▪ FDs from non-key attributes of an entity set to other attributes of

the same entity set, e.g.:

• employee entity with attributes including department_name

and building, and the FD department_name building

• Good design would have made department a separate entity

▪ FDs from non-key attributes of a relationship set to other attrib.

▪ It’s possible, but rare, since most relationships are binary.

Page 106: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.106Database System Concepts - 6th Edition

Naming of Attributes and Relationships

▪ Unique-role: each attribute name has a unique meaning

in the DB

▪ Bad names: number, id, name

▪ Although technically the order of attribute names in a

schema does not matter, it is convention to list primary-

key attributes first.

▪ Naming tables that reduce ER relationship sets – two

alternatives:

1. concatenate of the names of related entity sets, with

a hyphen or underscore, e.g. student_sec.

Page 107: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.107Database System Concepts - 6th Edition

Naming of Attributes and Relationships

▪ Naming tables that reduce ER relationship sets – two

alternatives:

2. In some cases concatenation is impossible or doesn’t

make sense → choose a new, descriptive name:

Relationship between two roles of the same entity

(e.g. manager is better than employee_employee)

Multiple relationships between the same pair of

entities.

Page 108: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.108Database System Concepts - 6th Edition

De-normalization for performance

In certain query-intensive applications (e.g. data

mining), we may want to use a non/de-normalized

schema to improve the performance of our queries

Example:

▪ if we want the course ID and title of a course,

along with the IDs of its prerequisites, we need to

join the tables course and prereq.

▪ if we run this query often, it may be more efficient

to trade off space for time

Page 109: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.109Database System Concepts - 6th Edition

De-normalization for performance

▪ Alternative 1: Create a de-normalized relation/table

containing only the attributes we need: course_id,

title, prereq_id.

• Plus: faster lookup

• Minus: extra space and execution time for updates

• Minus: extra coding work and possibility of errors in

the added code

▪ Alternative 2: use a materialized view defined as

course prereq

• Same as above, except we don’t have the second

minus.

Page 110: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.110Database System Concepts - 6th Edition

De-normalization for Performance

“Normalize until is hurts,

then de-normalize until it works!”

Page 111: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.111Database System Concepts - 6th Edition

Other Design Issues

Some aspects of DB design are not caught by normalization.

Examples of bad DB design, to be avoided: Instead of

earnings (company_id, year, amount ), someone has

created:

Separate tables: earnings_2004, earnings_2005,

earnings_2006, etc. All these tables are in BCNF, but:

querying across years is difficult

a new table needs to be created each year

Page 112: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.112Database System Concepts - 6th Edition

Other Design Issues

Examples of bad DB design, to be avoided: Instead of

earnings (company_id, year, amount ), someone has

created:

One table, but with a separate column for each year:

company_year (company_id, earnings_2004,

earnings_2005, earnings_2006)

It’s also in BCNF, but also makes querying across years

difficult and requires a new column each year.

Is an example of a crosstab, where values for one

attribute become column names

Used in spreadsheets, and other data analysis tools

Page 113: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.113Database System Concepts - 6th Edition

SKIP

8.9 Modeling Temporal Data

Page 114: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.114Database System Concepts - 6th Edition

Chapter 8 - what sections we covered:

8.1 Features of Good Relational

Design

8.2 Atomic Domains and First

Normal Form

8.3 Decomposition Using Functional

Dependencies

8.4 Functional Dependency Theory

8.4.1 Closure of a Set of FDs

8.4.2 Closure of Attribute Sets

8.4.3 Canonical cover

8.4.4 Lossless decomposition

8.4.5 Dependency preservation

8.5 Algorithms for decomposition

8.6 Decomposition Using

Multivalued Dependencies

8.6.1 Multivalued Dependencies

8.6.2 Fourth normal form

8.6.3 4NF Decomposition

8.7 More normal Forms

8.8 Database Design Process

8.9 Modeling Temporal Data

Page 115: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.115Database System Concepts - 6th Edition

Homework for Ch.8▪ 8.4, 8.5

▪ 8.6 → Derive only 7 new FDs, using the Algorithm for

Computing F+ (Fig. 8.7 in text)

▪ 8.26

▪ 8.29 → only (a) (b), then

• Use the attribute closure algorithm for each of the

LHSs of the four FDs

• (e) Take the first FD that violates BCNF and perform

BCNF decomposition

• Is the BCNF decomposition found above dependency

preserving?

• Skip (c), (d), and (f)!

• 8.33 (Hint: Use the “cross-tuple” interpretation!)EOL 4

Page 116: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.116Database System Concepts - 6th Edition

The next slides are a collection of the

algorithms we need to know from this chapter

Page 117: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.117Database System Concepts - 6th Edition

3NF

If for all FDs in F+

at least one of the following holds:

is trivial (i.e., )

is a superkey for R

Each attribute A in – is contained in

a candidate key for R. (NOTE: each

attribute may be in a different cand. key!)

BCNF

Page 118: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.118Database System Concepts - 6th Edition

Decomposing a Schema into BCNF

Suppose we have a schema R and a non-trivial dependency causes a violation of BCNF.

We decompose R into:

• R1 = ( U ), F1 = (…)

• R2 = ( R - ( - ) ), F2 = (…)

Usually we also want to know if the decomposition is dependency-preserving!

Page 119: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.119Database System Concepts - 6th Edition

Algorithm for F+

Assign F+ = F

repeat

for each FD f in F+

apply reflexivity and augmentation rules on f

add the resulting FDs to F +

for each pair of FDs f1and f2 in F +

if f1 and f2 can be combined using transitivity,

add the resulting FD to F +

until F + does not change any further

Reflexivity only to the RHS of f

Page 120: Chapter 8: Top-down Relational Database Design: …The process outlined above is called NORMALIZATION. Database System Concepts - 6th Edition 8.19 ©Silberschatz, Korth and Sudarshan

©Silberschatz, Korth and Sudarshan8.120Database System Concepts - 6th Edition

Algorithm for +

result := ;

while (changes to result) do

for each in F do

begin

if result then result := result

end