Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2.

Post on 28-Dec-2015

228 Views

Category:

Documents

8 Downloads

Preview:

Click to see full reader

Transcript

Databases 1Seventh lecture

Topics of the lecture

•Extended relational algebra•Normalization•Normal forms

2

3

Relational Algebra on Bags•A bag is like a set, but an element may

appear more than once.▫Multiset is another name for “bag.”

•Example: {1,2,1,3} is a bag. {1,2,3} is also a bag that happens to be a set.

•Bags also resemble lists, but order in a bag is unimportant.▫Example: {1,2,1} = {1,1,2} as bags, but

[1,2,1] != [1,1,2] as lists.

4

Why Bags?

•SQL, the most important query language for relational databases is actually a bag language.▫SQL will eliminate duplicates, but usually

only if you ask it to do so explicitly.

•Some operations, like projection, are much more efficient on bags than sets.

5

Operations on Bags•Selection applies to each tuple, so its

effect on bags is like its effect on sets.•Projection also applies to each tuple, but

as a bag operator, we do not eliminate duplicates.

•Products and joins are done on each pair of tuples, so duplicates in bags have no effect on how we operate.

6

Example: Bag Selection

A B B C

1 2 3 45 6 7 81 2

SELECTA+B<5 (R) = A B

1 21 2

R S

7

Example: Bag Projection

A B B C

1 2 3 45 6 7 81 2

PROJECTA (R) = A

151

R S

8

Example: Bag Product

A B B C

1 2 3 45 6 7 81 2

R * S = A R.B S.B C

1 2 3 41 2 7 85 6 3 45 6 7 81 2 3 41 2 7 8

R S

9

Example: Bag Theta-Join

A B B C

1 2 3 45 6 7 81 2

R JOIN R.B<S.B S = A R.B S.B C

1 2 3 41 2 7 85 6 7 81 2 3 41 2 7 8

R S

10

Bag Union

•Union, intersection, and difference need new definitions for bags.

•An element appears in the union of two bags the sum of the number of times it appears in each bag.

•Example: {1,2,1} UNION {1,1,2,3,1} = {1,1,1,1,1,2,2,3}

11

Bag Intersection

•An element appears in the intersection of two bags the minimum of the number of times it appears in either.

•Example: {1,1,2,1} INTER {1,1,2,3} = {1,1,2}.

12

Bag Difference

•An element appears in the difference A – B of bags as many times as it appears in A, minus the number of times it appears in B.▫But never less than 0 times.

•Example: {1,2,1} – {1,2,3} = {1}.

13

Beware: Bag Laws <> Set Laws

•Not all algebraic laws that hold for sets also hold for bags.

•For one example, the commutative law for union (R UNION S = S UNION R ) does hold for bags.▫Since addition is commutative, adding the

number of times x appears in R and S doesn’t depend on the order of R and S.

14

An Example of Inequivalence

•Set union is idempotent, meaning that S UNION S = S.

•However, for bags, if x appears n times in S, then it appears 2n times in S UNION S.

•Thus S UNION S <> S in general.

15

The Extended Algebra1. DELTA = eliminate duplicates from

bags.2. TAU = sort tuples.3. Extended projection : arithmetic,

duplication of columns.4. GAMMA = grouping and aggregation.5. OUTERJOIN: avoids “dangling tuples” =

tuples that do not join with anything.

16

Duplicate Elimination

•R1 := DELTA(R2).•R1 consists of one copy of each tuple that

appears in R2 one or more times.

17

Example: Duplicate Elimination

R = A B

1 23 41 2

DELTA(R) = A B

1 23 4

18

Sorting•R1 := TAUL (R2).

▫L is a list of some of the attributes of R2.

•R1 is the list of tuples of R2 sorted first on the value of the first attribute on L, then on the second attribute of L, and so on.▫Break ties arbitrarily.

•TAU is the only operator whose result is neither a set nor a bag.

•ORDER BY in SQL

19

Example: SortingR = A B

1 23 45 2

TAUB (R) = [(5,2), (1,2), (3,4)]

20

Extended Projection

• Using the same PROJL operator, we allow the list L to contain arbitrary expressions involving attributes, for example:

1. Arithmetic on attributes, e.g., A+B.2. Duplicate occurrences of the same

attribute.

21

Example: Extended ProjectionR = A B

1 23 4

PROJA+B,A,A (R) = A+B A1 A2

3 1 17 3 3

22

Aggregation Operators

•Aggregation operators are not operators of relational algebra.

•Rather, they apply to entire columns of a table and produce a single result.

•The most important examples: SUM, AVG, COUNT, MIN, and MAX.

23

Example: AggregationR = A B

1 33 43 2

SUM(A) = 7COUNT(A) = 3MAX(B) = 4AVG(B) = 3

24

Grouping Operator

• R1 := GAMMAL (R2). L is a list of elements that are either:

1. Individual (grouping ) attributes.2. AGG(A ), where AGG is one of the

aggregation operators and A is an attribute.

25

Applying GAMMAL(R)•Group R according to all the grouping

attributes on list L.▫That is, form one group for each distinct list of

values for those attributes in R.•Within each group, compute AGG(A ) for

each aggregation on list L.•Result has grouping attributes and

aggregations as attributes. One tuple for each list of values for the grouping attributes and their group’s aggregations.

26

Example: Grouping/Aggregation

R = A B C

1 2 34 5 61 2 5

GAMMAA,B,AVG(C) (R) = ??

First, group R :

A B C

1 2 31 2 54 5 6

Then, average C withingroups:

A B AVG(C)

1 2 44 5 6

27

Outerjoin•Suppose we join R JOINC S.•A tuple of R that has no tuple of S with

which it joins is said to be dangling.▫Similarly for a tuple of S.

•Outerjoin preserves dangling tuples by padding them with a special NULL symbol in the result.

28

Example: OuterjoinR = A B S = B C

1 2 2 34 5 6 7

(1,2) joins with (2,3), but the other two tuplesare dangling.

R OUTERJOIN S = A B C

1 2 34 5 NULLNULL 6 7

29

Normalization: Anomalies

•Goal of relational schema design is to avoid anomalies and redundancy.▫Update anomaly : one occurrence of a fact

is changed, but not all occurrences.▫Deletion anomaly : valid fact is lost when a

tuple is deleted.

30

Example of Bad Design

Data is redundant, because each of the ???’s can be figured out by using the FD’s name -> addr favBeer and beersLiked -> manf.

name addr BeersLiked manf FavBeer

Janeway Voyager Bud A.B. WickedAle

Janeway ??? WickedAle Pete’s ???

Spock Enterprise Bud ??? Bud

Drinkers(name, addr, beersLiked, manf, favBeer)

31

This Bad Design AlsoExhibits Anomalies

• Update anomaly: if Janeway is transferred to Intrepid, will we remember to change each of her tuples?• Deletion anomaly: If nobody likes Bud, we lose track of the fact that Anheuser-Busch manufactures Bud.

name addr BeersLiked manf FavBeer

Janeway Voyager Bud A.B. WickedAle

Janeway Voyager WickedAle Pete’s WickedAle

Spock Enterprise Bud A.B. Bud

32

Boyce-Codd Normal Form

•We say a relation R is in BCNF : if whenever X ->A is a nontrivial FD that holds in R, X is a superkey.▫Remember: nontrivial means A is not a

member of set X.▫Remember, a superkey is any superset of a

key (not necessarily a proper superset).

33

Example• Drinkers(name, addr, beersLiked, manf, favBeer)• FD’s: name->addr favBeer, beersLiked->manf•Only key is {name, beersLiked}.•In each FD, the left side is not a

superkey.•Any one of these FD’s shows Drinkers is

not in BCNF

34

Another Example•Beers(name, manf, manfAddr)•FD’s: name->manf, manf->manfAddr•Only key is {name}.•name->manf does not violate BCNF, but

manf->manfAddr does.

35

Decomposition into BCNF•Given: relation R with FD’s F.•Look among the given FD’s for a BCNF

violation X ->B.▫If any FD following from F violates

BCNF, then there will surely be an FD in F itself that violates BCNF.

•Compute X +.▫Not all attributes, or else X is a superkey.

36

Decompose R Using X -> B

• Replace R by relations with schemas:1. R1 = X +.

2. R2 = (R – X +) U X.

Project given FD’s F onto the two new relations.

1. Compute the closure of F = all nontrivial FD’s that follow from F.

2. Use only those FD’s whose attributes are all in R1 or all in R2.

37

Decomposition Picture

R-X + X X +-X

R2

R1

R

38

Example• Drinkers(name, addr, beersLiked, manf,

favBeer)• F = name->addr, name -> favBeer,

beersLiked->manf• Pick BCNF violation name->addr.• Close the left side: {name}+ = {name, addr,

favBeer}.• Decomposed relations:

1. Drinkers1(name, addr, favBeer)2. Drinkers2(name, beersLiked, manf)

39

Example, Continued•We are not done; we need to check

Drinkers1 and Drinkers2 for BCNF.•Projecting FD’s is complex in general,

easy here.•For Drinkers1(name, addr, favBeer),

relevant FD’s are name->addr and name->favBeer.▫Thus, name is the only key and Drinkers1 is

in BCNF.

40

Example, Continued• For Drinkers2(name, beersLiked, manf),

the only FD is beersLiked->manf, and the only key is {name, beersLiked}.

▫ Violation of BCNF.• beersLiked+ = {beersLiked, manf}, so

we decompose Drinkers2 into:1. Drinkers3(beersLiked, manf)2. Drinkers4(name, beersLiked)

41

Example, Concluded

• The resulting decomposition of Drinkers :

1. Drinkers1(name, addr, favBeer)2. Drinkers3(beersLiked, manf)3. Drinkers4(name, beersLiked)

Notice: Drinkers1 tells us about drinkers, Drinkers3 tells us about beers, and Drinkers4 tells us the relationship between drinkers and the beers they like.

42

Third Normal Form - Motivation

•There is one structure of FD’s that causes trouble when we decompose.

•AB ->C and C ->B.▫Example: A = street address, B = city,

C = zip code.•There are two keys, {A,B } and {A,C }.•C ->B is a BCNF violation, so we must

decompose into AC, BC.

43

We Cannot Enforce FD’s

•The problem is that if we use AC and BC as our database schema, we cannot enforce the FD AB ->C by checking FD’s in these decomposed relations.

•Example with A = street, B = city, and C = zip on the next slide.

44

An Unenforceable FD street zip

545 Tech Sq. 02138545 Tech Sq. 02139

city zip

Cambridge 02138Cambridge 02139

Join tuples with equal zip codes.

street city zip

545 Tech Sq. Cambridge 02138545 Tech Sq. Cambridge 02139

Although no FD’s were violated in the decomposed relations,FD street city -> zip is violated by the database as a whole.

45

3NF Let’s Us Avoid This Problem

•3rd Normal Form (3NF) modifies the BCNF condition so we do not have to decompose in this problem situation.

•An attribute is prime if it is a member of any key.

•X ->A violates 3NF if and only if X is not a superkey, and also A is not prime.

46

Example

•In our problem situation with FD’s AB ->C and C ->B, we have keys AB and AC.

•Thus A, B, and C are each prime.•Although C ->B violates BCNF, it does not

violate 3NF.

47

What 3NF and BCNF Give You• There are two important properties of a

decomposition:1. Recovery : it should be possible to

project the original relations onto the decomposed schema, and then reconstruct the original.

2. Dependency preservation : it should be possible to check in the projected relations whether all the given FD’s are satisfied.

48

3NF and BCNF, Continued•We can get (1) with a BCNF decompsition.

▫Explanation needs to wait for relational algebra.

•We can get both (1) and (2) with a 3NF decomposition.

•But we can’t always get (1) and (2) with a BCNF decomposition.▫street-city-zip is an example.

top related