Lecture #3 Functional Dependencies Normalization Relational Algebra

Post on 21-Mar-2016

40 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Lecture #3 Functional Dependencies Normalization Relational Algebra. Thursday, October 12, 2000. Administration. Homework #1 due today. Project descriptions & groups due today. Homework #2 available today. Exam date is looking like December 7 th Complaints? - PowerPoint PPT Presentation

Transcript

Lecture #3Functional Dependencies

NormalizationRelational Algebra

Thursday, October 12, 2000

Administration

• Homework #1 due today.• Project descriptions & groups due today.

• Homework #2 available today.• Exam date is looking like December 7th

– Complaints? • Projects: tell us if you need to use the lab.

Functional DependenciesDefinition:

If two tuples agree on the attributes

A , A , … A 1 2 n

then they must also agree on the attributesB , B , … B 1 2 m

Formally:

A , A , … A 1 2 n

B , B , … B 1 2 m

Motivating example for the study of functional dependencies:

Name Social Security Number Phone Number

Examples

• EmpID Name, Phone, Position• Position Phone• but Phone Position

EmpID Name Phone PositionE0045 Smith 1234 ClerkE1847 John 9876 SalesrepE1111 Smith 9876 SalesrepE9999 Mary 1234 lawyer

In General

• To check A B, erase all other columns

• check if the remaining relation is many-one (called functional in mathematics)

… A … BX1 Y1X2 Y2… …

Example

EmpID Name Phone PositionE0045 Smith 1234 ClerkE1847 John 9876 SalesrepE1111 Smith 9876 SalesrepE9999 Mary 1234 lawyer

More Examples

Product: name price, manufacturerPerson: ssn name, ageCompany: name stock price, president

Key of a relation is a set of attributes that:

- functionally determines all the attributes of the relation - none of its subsets determines all the attributes.

Superkey: a set of attributes that contains a key.

Finding the Keys of a Relation

Given a relation constructed from an E/R diagram, what is its key?

Rules:

1. If the relation comes from an entity set, the key of the relation is the set of attributes which is the key of the entity set.

address name ssn

Person

Rules for Binary Relationships

Several cases are possible for a binary relationship E1 - E2:

1. Many-many: the key includes the key of E1 together with the key of E2.

What happens for:

2. Many-one:

3. One-one:

PersonbuysProduct

name

price name ssn

Keys in Multiway Relationships

If there is an arrow from the relationship to E, then we don’t need the key of E as part of the relation key.

Purchase

Product

Person

Store

Payment Method

Rules in FD’s

A , A , … A 1 2 n B , B , … B 1 2 m

A , A , … A 1 2 n 1

Is equivalent to

B

A , A , … A 1 2 n 2B

A , A , … A 1 2 n mB…

Splitting rule and Combing rule

Splitting/Combining Rule:

Rules in FD’s (continued)

A , A , … A 1 2 n iA Always holds.

Trivial Dependency

Why ?

Rules in FD’s (continued)

A , A , … A 1 2 n

Transitive Closure Rule:

B , B , … B1 2 m

A , A , … A 1 2 n

1B , B …, B

2 m

1C , C …, C

2 p

1C , C …, C

2 p

If

and

then

Why ?

Closure of a set of Attributes

Given a set of attributes {A1, …, An} and a set of dependencies S.Problem: find all attributes B such that:

any relation which satisfies S also satisfies:A1, …, An B

The closure of {A1, …, An}, denoted {A1, …, An} ,is the set of all such attributes B

+

Closure AlgorithmStart with X={A1, …, An}.

Repeat until X doesn’t change do:

if is in S, and

C is not in X

then

add C to X.

B , B , … B 1 2 nC

B , B , … B 1 2 n

are all in X, and

ExampleA B CA D E B DA F B

Closure of {A,B}: X = {A, B, }

Closure of {A, F}: X = {A, F, }

Why Is the Algorithm Correct ?

• Show the following by induction:– For every B in X:

• A1, …, An B

• Initially X = {A1, …, An} -- holds• Induction step: B1, …, Bm in X

– Implies A1, …, An B1, …, Bm– We also have B1, …, Bm C– By transitivity we have A1, …, An C

• This shows that the algorithm is sound; need to show it is complete

Relational Schema Design

Main idea:• Start with some relational schema• Find out its FD’s• Use them to design a better relational

schema

Relational Schema Design

Name SSN Phone Number

Fred 123-321-99 (201) 555-1234Fred 123-321-99 (206) 572-4312Joe 909-438-44 (908) 464-0028Joe 909-438-44 (212) 555-4000

Problems:

- redundancy - update anomalies - deletion anomalies

Recall set attributes (persons with several phones):

Note: SSN is NOT a key here

Relation DecompositionSSN Name

123-321-99 Fred909-438-44 Joe

SSN Phone Number

123-321-99 (201) 555-1234123-321-99 (206) 572-4312909-438-44 (908) 464-0028909-438-44 (212) 555-4000

Break the relation into two:

Decompositions in GeneralA , A , … A 1 2 n

Let R be a relation with attributes

Create two relations R1 and R2 with attributes

B , B , … B 1 2 m C , C , … C 1 2 l

Such that:B , B , … B 1 2 m C , C , … C 1 2 l

A , A , … A 1 2 n

And -- R1 is the projection of R on

-- R2 is the projection of R on

B , B , … B 1 2 m

C , C , … C 1 2 l

Incorrect Decomposition

• Sometimes it is incorrect:Name Price Category

Gizmo 19.99 Gadget

OneClick 24.99 Camera

DoubleClick 29.99 Camera

Decompose on : Name, Category and Price, Category

Incorrect Decomposition

Name Category

Gizmo Gadget

OneClick Camera

DoubleClick Camera

Price Category

19.99 Gadget

24.99 Camera

29.99 Camera

Name Price Category

Gizmo 19.99 Gadget

OneClick 24.99 Camera

OneClick 29.99 Camera

DoubleClick 24.99 Camera

DoubleClick 29.99 Camera

When we put it back:

Cannot recover information

Boyce-Codd Normal FormA simple condition for removing anomalies from relations:

A relation R is in BCNF if and only if:

Whenever there is a nontrivial dependency for R , it is the case that { } a super-key for R.

A , A , … A 1 2 n

BA , A , … A 1 2 n

In English (though a bit vague):

Whenever a set of attributes of R is determining another attribute, should determine all the attributes of R.

BCNF DecompositionFind a dependency that violates the BCNF condition:

A , A , … A 1 2 n B , B , … B 1 2 m

A’sOthers B’s

R1 R2

Heuristic: choose B , B , … B “as large as possible”

1 2 m

Decompose:

Find a 2-attribute relation that isnot in BCNF.

Continue untilthere are noBCNF violationsleft.

Example Decomposition

Name SSN Age EyeColor PhoneNumber

Functional dependencies: SSN Name, Age, Eye Color

What if we also had an attribute Draft-worthy, and the FD: Age Draft-worthy

Person:

BNCF: Person1(SSN, Name, Age, EyeColor), Person2(SSN, PhoneNumber)

Other Example

• R(A,B,C,D) A B, B C

• Key: A, D• Violations of BCNF: A B, A C, A BC• Pick A BC: split into R1(A,BC) R2(A,D)• What happens if we pick A B first ?

Correct Decompositions A decomposition is lossless if we can recover: R(A,B,C)

R1(A,B) R2(A,C)

R’(A,B,C) = R(A,B,C)

R’ is in general larger than R. Must ensure R’ = R

Decomposition Based on BCNF is Necessarily Lossless

Attributes A, B, C. FD: A C

Relations R1(A,B) R2(A,C)

Tuple in R: (a,b,c)Tuples in R1: (a,b), (a,b’)Tuples in R2: (a,c), (a,c’)

Tuples in the join of R1 and R2: (a,b,c), (a,b,c’), (a,b’,c), (a,b’,c’)

Can (a,b,c’) be a bogus tuple? What about (a,b’,c’) ?

ExampleName SSN Phone Number

Fred 123-321-99 (201) 555-1234Fred 123-321-99 (206) 572-4312Joe 909-438-44 (908) 464-0028Joe 909-438-44 (212) 555-4000

What are the dependencies?

What are the keys?

Is it in BCNF?

And Now?SSN Name

123-321-99 Fred909-438-44 Joe

SSN Phone Number

123-321-99 (201) 555-1234123-321-99 (206) 572-4312909-438-44 (908) 464-0028909-438-44 (212) 555-4000

3NF: A Problem with BCNFUnit Company Product

Unit Company

Unit Product

FD’s: Unit -> Company; Company, Product -> UnitSo, there is a BCNF violation, and we decompose.

Unit Company

No FDs

So What’s the Problem?

Unit Company Product

Unit Company Unit Product

Galaga99 UW Galaga99 databasesBingo UW Bingo databases

No problem so far. All local FD’s are satisfied.Let’s put all the data back into a single table again:

Galaga99 UW databasesBingo UW databases

Violates the dependency: company, product -> unit!

Solution: 3rd Normal Form (3NF)

A simple condition for removing anomalies from relations:

A relation R is in 3rd normal form if and only if:

Whenever there is a nontrivial dependency for R , it is the case that { } a super-key for R, or B is part of a key.

A , A , … A 1 2 n

BA , A , … A 1 2 n

What happened to first and second normal forms?

Will we have more normal forms?

Multi-valued Dependencies SSN Phone Number Course

123-321-99 (206) 572-4312 CSE-444 123-321-99 (206) 572-4312 CSE-341123-321-99 (206) 432-8954 CSE-444123-321-99 (206) 432-8954 CSE-341

The multi-valued dependencies are:

SSN Phone Number SSN Course

Definition of Multi-valued Dependecy

Given R(A1,…,An,B1,…,Bm,C1,…,Cp)

the MVD A1,…,An B1,…,Bm holds if:

for any values of A1,…,An the “set of values” of B1,…,Bm is “independent” of those of C1,…Cp

Definition of MVDs Continued

Equivalently: the decomposition into

R1(A1,…,An,B1,…,Bm), R2(A1,…,An,C1,…,Cp)

is lossless

Note: an MVD A1,…,An B1,…,BmImplicitly talks about “the other” attributes C1,…Cp

Rules for MVDs

If A1,…An B1,…,Bm

then A1,…,An B1,…,Bm

Other rules in the book

4th Normal Form (4NF)

R is in 4NF if whenever: A1,…,An B1,…,Bmis a nontrivial MVD, then A1,…,An is a

superkey

Same as BCNF with FDs replaced by MVDs

Confused by Normal Forms ?

3NF

BCNF

4NF

In practice: (1) 3NF is enough, (2) don’t overdo it !

Querying the Database• How do we specify what we want from our

database? Find all the employees who earn more than

$50,000 and pay taxes in New Jersey.• We design high-level query languages:

– SQL (used everywhere)– Datalog (used by database theoreticians, their students,

friends and family)• Relational algebra: a basic set of operations on

relations that provide the basic principles.

Relational Algebra at a Glance• Operators: relations as input, new relation as output • Five basic RA operators:

– Basic Set Operators• union, difference (no intersection, no complement)

– Selection:– Projection: – Cartesian Product: X

• Derived operators:– Intersection, complement– Joins (natural,equi-join, theta join, semi-join)

• When our relations have attribute names:– Renaming:

Set Operations

• Binary operations• Union: all tuples in R1 or R2

– R1 U R2– Example:

• ActiveEmployees U RetiredEmployees

• Difference: all tuples in R1 and not in R2– R1 – R2– Example

• AllEmployees - RetiredEmployees

Selection• Unary operation: returns a subset of the

tuples which satisfy some condition• Notation: (R)• c is a condition:

– =, <, >, and, or, not• Find all employees with salary more than

$40,000:– (Employee)

c

Salary > 40000

Selection Example

EmployeeSSN Name DepartmentID Salary999999999 John 1 30,000777777777 Tony 1 32,000888888888 Alice 2 45,000

SSN Name DepartmentID Salary888888888 Alice 2 45,000

Find all employees with salary more than $40,000.

Projection• Unary operation: returns certain columns• Eliminates duplicate tuples !• Notation: (R)

• Example: project social-security number and names:– (Employee)

A1,…,An

SSN, Name

Projection Example

EmployeeSSN Name DepartmentID Salary999999999 John 1 30,000777777777 Tony 1 32,000888888888 Alice 2 45,000

SSN Name999999999 John777777777 Tony888888888 Alice

Cartesian Product• Binary Operation• Result is tuples combining any element of

R1 with any element of R2, for R1 X R2• Schema is union of Schema(R1) &

Schema(R2)• Notation: R1 x R2• Example: Employee x Dependents• Very rare in practice; but joins are very

common.

Cartesion Product Example

EmployeeName SSNJohn 999999999Tony 777777777

DependentsEmployeeSSN Dname999999999 Emily777777777 Joe

Employee_DependentsName SSN EmployeeSSN DnameJohn 999999999 999999999 EmilyJohn 999999999 777777777 JoeTony 777777777 999999999 EmilyTony 777777777 777777777 Joe

Join (Natural)• Most important, expensive and exciting.• Combines two relations, selecting only

related tuples• Equivalent to a cross product followed by

selection• Resulting schema has all attributes of the

two relations, but one copy of join condition attributes

Join Example

EmployeeName SSNJohn 999999999Tony 777777777

DependentsEmployeeSSN Dname999999999 Emily777777777 Joe

Employee_DependentsName SSN DnameJohn 999999999 EmilyTony 777777777 Joe

Other Joins and Renaming

• Theta join: the join involves a predicate– R S

• Semi-join: the attributes of one relation are included in the other.

• Renaming:

Complex QueriesProduct ( pname, price, category, maker)Purchase (buyer, seller, store, product)Company (cname, stock price, country)Person( per-name, phone number, city)

Find phone numbers of people who bought gizmos from Fred.

Find telephony products that somebody bought

Exercises Product ( pname, price, category, maker)Purchase (buyer, seller, store, product)Company (cname, stock price, country)Person( per-name, phone number, city)

Ex #1: Find people who bought telephony products.Ex #2: Find names of people who bought American productsEx #3: Find names of people who bought American products and did not buy French productsEx #4: Find names of people who bought American products and they live in Seattle.Ex #5: Find people who bought stuff from Joe or bought products from a company whose stock prices is more than $50.

Operations on Bags (and why we care)

Basic operations:

Projection Selection Union Intersection Set difference Cartesian product

Join (natural join, theta join)

top related