The Relational Datalsir · The Relational Data Model Lecture 6 2 Outline ¥Relational Data Model ¥Functional Dependencies ¥Logical Schema Design Reading Chapter 8. 3 The Relational

Post on 23-Sep-2020

4 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

1

The Relational DataModel

Lecture 6

2

Outline

• Relational Data Model

• Functional Dependencies

• Logical Schema Design

Reading

Chapter 8

3

The Relational Data Model

Data

Modeling

Relational

Schema

Physical

storage

E/R diagrams Tables:

column names: attributes

rows: tuples

Complex

file organization

and index

structures.

Have seen

this in SQL

Have seen

this tooDiscuss next

4

Terminology

Name Price Category Manufacturer

gizmo $19.99 gadgets GizmoWorks

Power gizmo $29.99 gadgets GizmoWorks

SingleTouch $149.99 photography Canon

MultiTouch $203.99 household Hitachi

Tuples or rows or records

Attribute namesTable name or relation name

Products:

5

Schemas

Relational Schema:

– Relation name plus attribute names

– E.g.• Product(Name, Price, Category, Manufacturer)

– In practice we add the domain for each attribute

Database Schema

– Set of relational schemas

– E.g.• Product(Name, Price, Category, Manufacturer)

• Company(Name, Address, Phone), . . .

6

Instances

• Relational schema = R(A1,…,Ak)Instance = relation with k attributes

• Database schema = R1(…), R2(…), …, Rn(…)Instance = n relations, of types R1, R2, ..., Rn

This is all mathematics, not to be confused with SQLtables!

(What's the difference?)

7

Example

Name Price Category Manufacturer

gizmo $19.99 gadgets GizmoWorks

Power gizmo $29.99 gadgets GizmoWorks

SingleTouch $149.99 photography Canon

MultiTouch $203.99 household Hitachi

Relational schema: Product(Name, Price, Category, Manufacturer)

Instance:

8

Design Criteria

• A relational schema should ensure:– Data integrity

• data is consistent and satisfies integrityconstraints

– Data redundancy should be avoided

• The process of creating a relationalschema that follows certain rules iscalled normalisation– We will learn several of these rules

9

Another “Example”

3.9Carol

3.7Bob

3.8Alice

CoursesGPAName

OS

DB

Math

OS

DB

OS

Math

Student

How can this be expressed in a relational schema?

10

Outlook: First Normal Form (1NF)

• A database schema is in First Normal Form ifall tables are flat

3.9Carol

3.7Bob

3.8Alice

CoursesGPAName

OS

DB

Math

OS

DB

OS

Math

Student

3.9Carol

3.7Bob

3.8Alice

GPAName

Student

Course

OS

DB

Math

OSCarol

OSAlice

DBBob

Alice

Carol

Alice

Student Course

DB

Math

Math

Takes Course

11

Example cont.

OS3.7Bob

OS3.8Alice

DB3.8Alice

OS3.9Carol

DB3.7Bob

Math3.8Alice

CourseGPAName

Theoretically, also this flattened

table is in 1NF but it has several

problems:

• Update anomalies• Inserts etc.

Only a compound key between

Name and Course is possible

12

1NF cont.

• Features:– All values (attributes) are atomic

• For instance, no comma separated are valuesallowed!

– “Conventional” SQL based databasestypically adhere to the 1NF

• Relational Rule 1 (based on Codd!s 12 Rules):

– A table that has no multi-valued fields issaid to be in the first normal form.

13

Normal Forms: Overview

– 1st Normal Form (1NF)

– 2nd Normal Form (2NF)

– 3rd Normal Form (3NF)

– Boyce Codd Normal Form (BCNF)

• The higher the normal form, the moreredundancies are reduced

Str

ict

ord

erin

g

14

Outline

• Relational Data Model

• Functional Dependencies

• Logical Schema Design

15

Functional Dependencies (FD)

• A form of constraint– hence, part of the schema

• Finding them is part of the databasedesign

• Also used in normalising the relations

Warning: this is the most abstract, and “hardest” part ofthe course.

16

Functional Dependency:

Graphically

(Patrick O’Neil 1994)

17

FD cont.

• Important: the intent of the DB designer isexpressed

• “Two rows cannot agree in value on attributeA and disagree on B”.

If r1(A) = r2(A) then r1(B) = r2(B)

– “A functionally determines B”

– “B is functionally dependent on A”

• In other words: A must be unique

18

FD Definitions

Definition:

If two tuples agree on the attributes

then they must also agree on the attributes

Formally:

A1, A2, …, An ! B1, B2, …, Bm

A1, A2, …, An

B1, B2, …, Bm

19

Examples

• EmpID ! Name, Phone, Position

• Position ! Phone

• but Phone ! Position

EmpID Name Phone Position

E0045 Smith 1234 ClerkE1847 John 9876 SalesrepE1111 Smith 9876 SalesrepE9999 Mary 1234 Lawyer

20

Example

EmpID Name Phone Position

E0045 Smith 1234 Clerk

E1847 John 9876 Salesrep

E1111 Smith 9876 Salesrep

E9999 Mary 1234 Lawyer

Position ! Phone

Cle

rk

Sal

esre

p

Lay

er

.

.

.

Phone

Position

1234

9876

21

In General

• To check A ! B, erase all other columns

• check if the remaining relation is many-to-one (called functional in mathematics)

… A … B

X1 Y1

X2 Y2

… …

22

Typical Examples of FDs

Product: name ! price, manufacturer

Person: ssn ! name, age

Company: name ! stockprice, president

23

Example

Product(name, category, color, department, price)

name ! color

category ! department

color, category ! price

Consider these FDs:

What do they say?

24

ExampleFDs are constraints on relations:

• On some instances they hold

• On others they don’t

99ToysGreenGadgetTweaker

49ToysGreenGadgetGizmo

pricedepartmentcolorcategoryname

Does this instance satisfy all the FDs?

name ! color

category ! department

color, category ! price

25

Example

59Office-

supp.GreenStationaryGizmo

99ToysBlackGadgetTweaker

49ToysGreenGadgetGizmo

pricedepartmentcolorcategoryname

What about this one?

name ! color

category ! department

color, category ! price

26

Example

If some FDs are satisfied, then

others are satisfied too

If all these FDs are true:

name ! color

category ! department

color, category ! price

Then this FD also holds: name, category ! price

Why ??

27

Inference Rules for FDs

Is equivalent to

(1) Splitting rule

and

(2) Combining rule

Bm...B1An...A1

A1, A2, …, An ! B1, B2, …, Bm

A1, A2, …, An ! B1

A1, A2, …, An ! B2

. . . . .

A1, A2, …, An ! Bm

splitcombine

28

Inference Rules for FDs

(continued)

(3) Trivial Rule

Am…A1

where i = 1, 2, ..., n

Ai is a subset of A1..n

A1, A2, …, An ! Ai

29

Inference Rules for FDs

(continued)

(4) Transitive Closure Rule

If

and

then

A1, A2, …, An ! B1, B2, …, Bm

B1, B2, …, Bm ! C1, C2, …, Cp

A1, A2, …, An ! C1, C2, …, Cp

30

...C1 CpBm…B1Am…A1

31

Example (continued)

Start from the following FDs:

Infer the following FDs:

1. name ! color

2. category ! department

3. color, category ! price

8. name, category ! price

7. name, category ! color, category

6. name, category ! category

5. name, category ! color

4. name, category ! name

Which Rule

did we apply ?Inferred FD

32

Example (continued)

Answers:

Transitivity on 3, 78. name, category ! price

Split/combine on 5, 67. name, category ! color, category

Trivial rule6. name, category ! category

Transitivity on 4, 15. name, category ! color

Trivial rule4. name, category ! name

Which Rule

did we apply ?Inferred FD

1. name ! color

2. category ! department

3. color, category ! price

33

Another Example

• Enrollment(student, major, course, room,time)

student ! major

major, course ! room

course ! time

34

Another Rule

If

then

Augmentation follows from trivial rules and transitivity

How?

A1, A2, …, An ! B

A1, A2, …, An , C1, C2, …, Cp ! B

(5) Augmentation Rule

35

“Solving” the Augmentation

Rule

A1, A2, …, An, C1, C2, …, Cp ! A1, A2, …, An

A1, A2, …, An , C1, C2, …, Cp ! B

A1, A2, …, An ! B

Transitivity gives:

Trivial Rule

36

Summary of Rules

(1)Splitting (Decomposition) Rule

If X ! YZ, then X ! Y and X ! Z

(2) Combining (Union) Rule

If X ! Y and X ! Z, then X ! YZ

(3) Trivial (Reflexivity) Rule

If Y ! X, then X ! Y

(4) Transitive Closure Rule

If X ! Y and Y ! Z, then X ! Z

(5) Augmentation Rule

If X ! Y, then XZ ! Y, for every Z

37

Problem: infer ALL FDs

Given a set of FDs, infer all possible FDs

How to proceed ?

• Try all possible FDs, apply all rules– E.g. R(A, B, C, D): how many FDs are possible ?

• Answer: 24 subsets of attributes

• Drop trivial FDs, drop augmented FDs– Still way too many

• Better: use the Closure Algorithm (next)

38

FDs and Closure

• Typically, a relation R has a set of

defined functional dependencies F

• The closure of F (written F+) is the setof all functional dependencies that may

be derived from F

– F+ contains all defined FDs and derivedFDs

39

Closure of a set of AttributesGiven a set of attributes A1, …, An

The closure, {A1, …, An}+ , is the set of attributes B

s.t. A1, …, An ! B

name ! color

category ! department

color, category ! price

Example:

Closures:

name+ = {name, color}

{name, category}+ = {name, category, color, department, price}

color+ = {color}

40

Closure Algorithm

Start with X={A1, …, An}.

Repeat until X doesn’t change do:

if B1, …, Bn ! C is a FD and

B1, …, Bn are all in X

then add C to X. {name, category}+ =

{name, category, color,

department, price}

name ! color

category ! department

color, category ! price

Example:

41

Closure Algorithm more verbose

• Starting with the given set of attributes, repeatedly expand the set by

adding the right sides of FDs as soon as we have included their left sides.

• Eventually, we cannot expand the set any more, and the resulting set is

the closure.

1 Let X be a set of attributes that eventually will become the closure. First

we initialize X to be {A1, A2, …, An}.

2 Now, repeatedly search for some FD in X:

B1B2…Bm!C

such that all of Bs are in the set X, but C is not. We then add C to X.

3 Repeat step 2 as many times as necessary until no more attributes can

be added to X.

Since X can only grow, and the number of attributes is finite, eventually nothing

more can be added to X.

4 The set X after no more attributes can be added to it is the: {A1, A2, …,

An}+. (Alex Thomo 2006)

42

Example

Compute {A,B}+ X = {A, B, ? }

Compute {A, F}+ X = {A, F, ? }

R(A,B,C,D,E,F) A, B ! C

A, D ! E

B ! D

A, F ! B

43

Example (Solution)

Compute {A,B}+ X = {A, B, C, D, E }

Compute {A, F}+ X = {A, F, B, C, D, E }

R(A,B,C,D,E,F) A, B ! C

A, D ! E

B ! D

A, F ! B

44

Using Closure to Infer ALL FDs

A, B ! C

A, D ! B

B ! D

Example:

A+ = A, B+ = BD, C+ = C, D+ = D

AB+ = ABCD, AC+ = AC, AD+ = ABCD

ABC+ = ABD+ = ACD+ = ABCD (no need to compute– why ?)

BCD+ = BCD, ABCD+ = ABCD

45

Problem: Finding FDs

• Approach 1: During Database Design– Designer derives them from real-world knowledge

of users

– Problem: knowledge might not be available

• Approach 2: From a Database Instance– Analyze a given database instance and find all

FDs satisfied by that instance

– Useful if designers don!t get enough informationfrom users

– Problem: FDs might be artificial for the giveninstance

46

Find All FDs

020CircuitsEEFrank

045DBCSEElsa

050JavaCSEDan

045DBCSECarol

040HWEEAlice

020C++CSEBob

020C++CSEAlice

RoomCourseDeptStudent

47

Some Answers

Course ! Dept, Room

Dept, Room ! Course

Student, Dept ! Course, Room

Student, Course ! Dept, Room

Student, Room ! Dept, Course

Do all FDs

make sense

in practice ?

48

Keys

• A key is a set of attributes A1, ..., An s.t. for anyother attribute B, we have A1, ..., An ! B– Example

• A minimal key is a set of attributes which is akey and for which no subset is a key– Example

• Note: our course book calls them superkeyand key

A1, A2, A3, A4

A1, A2, A3, A4

49

Computing Keys

• Compute X+ for all sets X

• If X+ = all attributes, then X is a key

• List only the minimal keys

Note: there can be many minimal keys !

• Example: R(A,B,C), AB!C, BC!AMinimal keys: AB and BC

50

Let!s reuse the Closure

ExampleA, B ! C

A, D ! B

B ! D

Example:

A+ = A, B+ = BD, C+ = C, D+ = D

AB+ = ABCD, AC+ = AC, AD+ = ABCD

ABC+ = ABD+ = ACD+ = ABCD (no need to compute– why ?)

BCD+ = BCD, ABCD+ = ABCD

Minimal keys

Keys

51

More Examples of Keys

• Product(name, price, category, color)name, category ! price

category ! color

Keys are: {name, category} and all supersets

• Enrollment(student, address, course, room, time)student ! address

room, time ! course

student, course ! room, time

… done in class

52

Outline

• Relational Data Model

• Functional Dependencies

• Logical Schema Design

53

Relational Schema Design

(or Logical Schema Design)

Main idea:

• Start with some relational schema

• Find out its FDs

• Use them to design a better relational

schema

54

Data Anomalies

When a database is poorly designed we getanomalies:

Redundancy: data is repeated

Update anomalies: need to change in severalplaces

Delete anomalies: may lose data when we don!twant

55

Relational Schema Design

Anomalies:• Redundancy = repeat data

• Update anomalies = Fred moves to “Bellevue”

• Deletion anomalies = Joe deletes his phone number:

what is his city ?

Example: Persons with several phones

SSN ! Name, City

Westfield908-555-2121987-65-4321Joe

Seattle206-555-6543123-45-6789Fred

Seattle206-555-1234123-45-6789Fred

CityPhoneNumberSSNName

but not SSN ! PhoneNumber

56

Relation DecompositionBreak the relation into two:

Westfield987-65-4321Joe

Seattle123-45-6789Fred

CitySSNName

908-555-2121987-65-4321

206-555-6543123-45-6789

206-555-1234123-45-6789

PhoneNumberSSN

Anomalies have gone:• No more repeated data

• Easy to move Fred to “Bellevue” (how ?)

• Easy to delete all Joe’s phone number (how ?)

Westfield908-555-2121987-65-4321Joe

Seattle206-555-6543123-45-6789Fred

Seattle206-555-1234123-45-6789Fred

CityPhoneNumberSSNName

57

Relational Schema Design

PersonbuysProduct

name

price name ssn

Conceptual Model:

Relational Model:

plus FDs

Normalization:

Eliminates anomalies

58

Decompositions in General

R1 = projection of R on A1, ..., An, B1, ..., Bm

R2 = projection of R on A1, ..., An, C1, ..., Cp

R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)

R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)

59

Decomposition

• Sometimes it is correct:

Camera19.99Gizmo

Camera24.99OneClick

Gadget19.99Gizmo

CategoryPriceName

19.99Gizmo

24.99OneClick

19.99Gizmo

PriceName

CameraGizmo

CameraOneClick

GadgetGizmo

CategoryName

Lossless decomposition

60

Incorrect Decomposition

• Sometimes it is not:

Camera19.99Gizmo

Camera24.99OneClick

Gadget19.99Gizmo

CategoryPriceName

CameraGizmo

CameraOneClick

GadgetGizmo

CategoryName

Camera19.99

Camera24.99

Gadget19.99

CategoryPrice

What’s

incorrect ??

Lossy decomposition

61

Decompositions in General

R(A1, ..., An, B1, ..., Bm, C1, ..., Cp)

If A1, ..., An ! B1, ..., Bm

Then the decomposition is lossless

R1(A1, ..., An, B1, ..., Bm) R2(A1, ..., An, C1, ..., Cp)

Example: name ! price, hence the first decomposition is lossless

Note: don’t need necessarily A1, ..., An ! C1, ..., Cp

62

Reminder: Normal Forms

First Normal Form = all attributes are atomic

Second Normal Form (2NF) = in principleold/obsolete

Third Normal Form (3NF) = this lecture

Boyce Codd Normal Form (BCNF) = this lecture

Others...

63

Third Normal Form (3NF)

• A relation is in 3NF if, and only if:– 1NF is satisfied

– Every non-key attribute is functionallydependent on the whole key

• i.e. no partial key functional dependencies

– No transitive functional dependencies:• A non-key attribute must not be functionally

dependent on another non-key attribute

– No data redundancies

64

Example

• Now we havereached 3NF

1NF but not 3NF

manager ! address(http://db.grussel.org)

65

Potential Problems with 3NF

• If a relation has more than 1 candidate key,anomalies may occur

• 3NF does not deal satisfactory withoverlapping candidate keys

• Need a stricter form:

– Boyce Codd Normal Form (BCNF)

• Example see:http://www.answers.com/topic/boyce-codd-normal-form

66

“Prod Cat, Stock Cat ! Range Code”

BCNF

67

Boyce-Codd Normal Form

In English (though a bit vague):

Whenever a set of attributes of R is determining another attribute,

it should determine all the attributes of R.

Every determinant (LHS of the FD) is a candidate key.

A relation R is in BCNF if:

If A1, ..., An ! B is a non-trivial dependency

in R , then {A1, ..., An} is a key for R

68

BCNF Decomposition

Algorithm

A’s OthersB’s

R1

Repeat

choose A1, …, Am ! B1, …, Bn that violates the BCNF condition

split R into R1(A1, …, Am, B1, …, Bn) and R2(A1, …, Am, [others])

continue with both R1 and R2

Until no more violations

R2

69

Example

What are the dependencies?

SSN ! Name, City

What are the keys?

{SSN, PhoneNumber}

Is it in BCNF?

Westfield908-555-1234987-65-4321Joe

Westfield908-555-2121987-65-4321Joe

Seattle206-555-6543123-45-6789Fred

Seattle206-555-1234123-45-6789Fred

CityPhoneNumberSSNName

70

Decompose it into BCNF

Westfield987-65-4321Joe

Seattle123-45-6789Fred

CitySSNName

908-555-1234987-65-4321

908-555-2121987-65-4321

206-555-6543123-45-6789

206-555-1234123-45-6789

PhoneNumberSSN

SSN ! Name, City

Let’s check anomalies:

• Redundancy ?

• Update ?

• Delete ?

71

Summary of BCNF

DecompositionFind a dependency that violates the BCNF condition:

A’s OthersB’s

R1 R2

Heuristics: choose B , B , … B “as large as possible”1 2 m

Decompose:

2-attribute

relations are BCNF

Continue until

there are no

BCNF violations

left.

A1, A2, …, An ! B1, B2, …, Bm

72

Example DecompositionPerson(name, SSN, age, hairColor, phoneNumber)

SSN ! name, age

age ! hairColor

Decompose in BCNF (in class):

Step 1: find all keys (How ? Compute S+, for various sets S)

Step 2: now decompose

73

Other Example

• R(A,B,C,D) A ! B, B ! C

• Key: AD

• Violations of BCNF (need to decompose)

– A ! B, A! C, A!BC

• Pick A! BC:

– split into R1(A,B,C) R2(A,D)

• What happens if we pick A ! B first ?

74

Lossless Decompositions

A decomposition is lossless if we can recover:

R(A,B,C)

R1(A,B) R2(A,C)

R!(A,B,C) should be the same as R(A,B,C)

R’ is in general larger than R. Must ensure R’ = R

Decompose

Recover

75

Lossless Decompositions

• Given R(A,B,C) s.t. A!B, thedecomposition into R1(A,B), R2(A,C) is

lossless

76

3NF: A Problem with BCNF

Unit Company Product

Unit Company

Unit Product

FDs: Unit ! Company; Company, Product ! Unit

So, there is a BCNF violation, and we decompose.

Unit ! Company

No FDs

Notice: we loose the FD: Company, Product ! Unit

77

So What!s the Problem?

Unit Company Product

Unit Company Unit Product

Galaga99 UW Galaga99 databases

Bingo UW Bingo databases

No problem so far. All local FD’s are satisfied.

Let’s put all the data back into a single table again (anomalies?):

Galaga99 UW databases

Bingo UW databases

Violates the dependency: company, product -> unit!

78

Solution: 3rd Normal Form

(3NF)

A relation R is in Third Normal Form if :

Whenever there is a nontrivial dependency A1, A2, ..., An ! B

for R , then {A1, A2, ..., An } is a key for R,

or B is part of a key.

Tradeoff:

BCNF = no anomalies, but may lose some FDs

3NF = keeps all FDs, but may have some anomalies

79

Summary

• Dependencies on attributes are

important when designing database

schema

– Functional dependencies

– Attributes should be dependent only on(primary) key

• Use normalised database schemas to

avoid certain anomalies with data

top related