Databases 1 Seventh lecture. Topics of the lecture Extended relational algebra Normalization Normal forms 2.
Post on 28-Dec-2015
228 Views
Preview:
Transcript
Databases 1Seventh lecture
Topics of the lecture
•Extended relational algebra•Normalization•Normal forms
2
3
Relational Algebra on Bags•A bag is like a set, but an element may
appear more than once.▫Multiset is another name for “bag.”
•Example: {1,2,1,3} is a bag. {1,2,3} is also a bag that happens to be a set.
•Bags also resemble lists, but order in a bag is unimportant.▫Example: {1,2,1} = {1,1,2} as bags, but
[1,2,1] != [1,1,2] as lists.
4
Why Bags?
•SQL, the most important query language for relational databases is actually a bag language.▫SQL will eliminate duplicates, but usually
only if you ask it to do so explicitly.
•Some operations, like projection, are much more efficient on bags than sets.
5
Operations on Bags•Selection applies to each tuple, so its
effect on bags is like its effect on sets.•Projection also applies to each tuple, but
as a bag operator, we do not eliminate duplicates.
•Products and joins are done on each pair of tuples, so duplicates in bags have no effect on how we operate.
6
Example: Bag Selection
A B B C
1 2 3 45 6 7 81 2
SELECTA+B<5 (R) = A B
1 21 2
R S
7
Example: Bag Projection
A B B C
1 2 3 45 6 7 81 2
PROJECTA (R) = A
151
R S
8
Example: Bag Product
A B B C
1 2 3 45 6 7 81 2
R * S = A R.B S.B C
1 2 3 41 2 7 85 6 3 45 6 7 81 2 3 41 2 7 8
R S
9
Example: Bag Theta-Join
A B B C
1 2 3 45 6 7 81 2
R JOIN R.B<S.B S = A R.B S.B C
1 2 3 41 2 7 85 6 7 81 2 3 41 2 7 8
R S
10
Bag Union
•Union, intersection, and difference need new definitions for bags.
•An element appears in the union of two bags the sum of the number of times it appears in each bag.
•Example: {1,2,1} UNION {1,1,2,3,1} = {1,1,1,1,1,2,2,3}
11
Bag Intersection
•An element appears in the intersection of two bags the minimum of the number of times it appears in either.
•Example: {1,1,2,1} INTER {1,1,2,3} = {1,1,2}.
12
Bag Difference
•An element appears in the difference A – B of bags as many times as it appears in A, minus the number of times it appears in B.▫But never less than 0 times.
•Example: {1,2,1} – {1,2,3} = {1}.
13
Beware: Bag Laws <> Set Laws
•Not all algebraic laws that hold for sets also hold for bags.
•For one example, the commutative law for union (R UNION S = S UNION R ) does hold for bags.▫Since addition is commutative, adding the
number of times x appears in R and S doesn’t depend on the order of R and S.
14
An Example of Inequivalence
•Set union is idempotent, meaning that S UNION S = S.
•However, for bags, if x appears n times in S, then it appears 2n times in S UNION S.
•Thus S UNION S <> S in general.
15
The Extended Algebra1. DELTA = eliminate duplicates from
bags.2. TAU = sort tuples.3. Extended projection : arithmetic,
duplication of columns.4. GAMMA = grouping and aggregation.5. OUTERJOIN: avoids “dangling tuples” =
tuples that do not join with anything.
16
Duplicate Elimination
•R1 := DELTA(R2).•R1 consists of one copy of each tuple that
appears in R2 one or more times.
17
Example: Duplicate Elimination
R = A B
1 23 41 2
DELTA(R) = A B
1 23 4
18
Sorting•R1 := TAUL (R2).
▫L is a list of some of the attributes of R2.
•R1 is the list of tuples of R2 sorted first on the value of the first attribute on L, then on the second attribute of L, and so on.▫Break ties arbitrarily.
•TAU is the only operator whose result is neither a set nor a bag.
•ORDER BY in SQL
19
Example: SortingR = A B
1 23 45 2
TAUB (R) = [(5,2), (1,2), (3,4)]
20
Extended Projection
• Using the same PROJL operator, we allow the list L to contain arbitrary expressions involving attributes, for example:
1. Arithmetic on attributes, e.g., A+B.2. Duplicate occurrences of the same
attribute.
21
Example: Extended ProjectionR = A B
1 23 4
PROJA+B,A,A (R) = A+B A1 A2
3 1 17 3 3
22
Aggregation Operators
•Aggregation operators are not operators of relational algebra.
•Rather, they apply to entire columns of a table and produce a single result.
•The most important examples: SUM, AVG, COUNT, MIN, and MAX.
23
Example: AggregationR = A B
1 33 43 2
SUM(A) = 7COUNT(A) = 3MAX(B) = 4AVG(B) = 3
24
Grouping Operator
• R1 := GAMMAL (R2). L is a list of elements that are either:
1. Individual (grouping ) attributes.2. AGG(A ), where AGG is one of the
aggregation operators and A is an attribute.
25
Applying GAMMAL(R)•Group R according to all the grouping
attributes on list L.▫That is, form one group for each distinct list of
values for those attributes in R.•Within each group, compute AGG(A ) for
each aggregation on list L.•Result has grouping attributes and
aggregations as attributes. One tuple for each list of values for the grouping attributes and their group’s aggregations.
26
Example: Grouping/Aggregation
R = A B C
1 2 34 5 61 2 5
GAMMAA,B,AVG(C) (R) = ??
First, group R :
A B C
1 2 31 2 54 5 6
Then, average C withingroups:
A B AVG(C)
1 2 44 5 6
27
Outerjoin•Suppose we join R JOINC S.•A tuple of R that has no tuple of S with
which it joins is said to be dangling.▫Similarly for a tuple of S.
•Outerjoin preserves dangling tuples by padding them with a special NULL symbol in the result.
28
Example: OuterjoinR = A B S = B C
1 2 2 34 5 6 7
(1,2) joins with (2,3), but the other two tuplesare dangling.
R OUTERJOIN S = A B C
1 2 34 5 NULLNULL 6 7
29
Normalization: Anomalies
•Goal of relational schema design is to avoid anomalies and redundancy.▫Update anomaly : one occurrence of a fact
is changed, but not all occurrences.▫Deletion anomaly : valid fact is lost when a
tuple is deleted.
30
Example of Bad Design
Data is redundant, because each of the ???’s can be figured out by using the FD’s name -> addr favBeer and beersLiked -> manf.
name addr BeersLiked manf FavBeer
Janeway Voyager Bud A.B. WickedAle
Janeway ??? WickedAle Pete’s ???
Spock Enterprise Bud ??? Bud
Drinkers(name, addr, beersLiked, manf, favBeer)
31
This Bad Design AlsoExhibits Anomalies
• Update anomaly: if Janeway is transferred to Intrepid, will we remember to change each of her tuples?• Deletion anomaly: If nobody likes Bud, we lose track of the fact that Anheuser-Busch manufactures Bud.
name addr BeersLiked manf FavBeer
Janeway Voyager Bud A.B. WickedAle
Janeway Voyager WickedAle Pete’s WickedAle
Spock Enterprise Bud A.B. Bud
32
Boyce-Codd Normal Form
•We say a relation R is in BCNF : if whenever X ->A is a nontrivial FD that holds in R, X is a superkey.▫Remember: nontrivial means A is not a
member of set X.▫Remember, a superkey is any superset of a
key (not necessarily a proper superset).
33
Example• Drinkers(name, addr, beersLiked, manf, favBeer)• FD’s: name->addr favBeer, beersLiked->manf•Only key is {name, beersLiked}.•In each FD, the left side is not a
superkey.•Any one of these FD’s shows Drinkers is
not in BCNF
34
Another Example•Beers(name, manf, manfAddr)•FD’s: name->manf, manf->manfAddr•Only key is {name}.•name->manf does not violate BCNF, but
manf->manfAddr does.
35
Decomposition into BCNF•Given: relation R with FD’s F.•Look among the given FD’s for a BCNF
violation X ->B.▫If any FD following from F violates
BCNF, then there will surely be an FD in F itself that violates BCNF.
•Compute X +.▫Not all attributes, or else X is a superkey.
36
Decompose R Using X -> B
• Replace R by relations with schemas:1. R1 = X +.
2. R2 = (R – X +) U X.
Project given FD’s F onto the two new relations.
1. Compute the closure of F = all nontrivial FD’s that follow from F.
2. Use only those FD’s whose attributes are all in R1 or all in R2.
37
Decomposition Picture
R-X + X X +-X
R2
R1
R
38
Example• Drinkers(name, addr, beersLiked, manf,
favBeer)• F = name->addr, name -> favBeer,
beersLiked->manf• Pick BCNF violation name->addr.• Close the left side: {name}+ = {name, addr,
favBeer}.• Decomposed relations:
1. Drinkers1(name, addr, favBeer)2. Drinkers2(name, beersLiked, manf)
39
Example, Continued•We are not done; we need to check
Drinkers1 and Drinkers2 for BCNF.•Projecting FD’s is complex in general,
easy here.•For Drinkers1(name, addr, favBeer),
relevant FD’s are name->addr and name->favBeer.▫Thus, name is the only key and Drinkers1 is
in BCNF.
40
Example, Continued• For Drinkers2(name, beersLiked, manf),
the only FD is beersLiked->manf, and the only key is {name, beersLiked}.
▫ Violation of BCNF.• beersLiked+ = {beersLiked, manf}, so
we decompose Drinkers2 into:1. Drinkers3(beersLiked, manf)2. Drinkers4(name, beersLiked)
41
Example, Concluded
• The resulting decomposition of Drinkers :
1. Drinkers1(name, addr, favBeer)2. Drinkers3(beersLiked, manf)3. Drinkers4(name, beersLiked)
Notice: Drinkers1 tells us about drinkers, Drinkers3 tells us about beers, and Drinkers4 tells us the relationship between drinkers and the beers they like.
42
Third Normal Form - Motivation
•There is one structure of FD’s that causes trouble when we decompose.
•AB ->C and C ->B.▫Example: A = street address, B = city,
C = zip code.•There are two keys, {A,B } and {A,C }.•C ->B is a BCNF violation, so we must
decompose into AC, BC.
43
We Cannot Enforce FD’s
•The problem is that if we use AC and BC as our database schema, we cannot enforce the FD AB ->C by checking FD’s in these decomposed relations.
•Example with A = street, B = city, and C = zip on the next slide.
44
An Unenforceable FD street zip
545 Tech Sq. 02138545 Tech Sq. 02139
city zip
Cambridge 02138Cambridge 02139
Join tuples with equal zip codes.
street city zip
545 Tech Sq. Cambridge 02138545 Tech Sq. Cambridge 02139
Although no FD’s were violated in the decomposed relations,FD street city -> zip is violated by the database as a whole.
45
3NF Let’s Us Avoid This Problem
•3rd Normal Form (3NF) modifies the BCNF condition so we do not have to decompose in this problem situation.
•An attribute is prime if it is a member of any key.
•X ->A violates 3NF if and only if X is not a superkey, and also A is not prime.
46
Example
•In our problem situation with FD’s AB ->C and C ->B, we have keys AB and AC.
•Thus A, B, and C are each prime.•Although C ->B violates BCNF, it does not
violate 3NF.
47
What 3NF and BCNF Give You• There are two important properties of a
decomposition:1. Recovery : it should be possible to
project the original relations onto the decomposed schema, and then reconstruct the original.
2. Dependency preservation : it should be possible to check in the projected relations whether all the given FD’s are satisfied.
48
3NF and BCNF, Continued•We can get (1) with a BCNF decompsition.
▫Explanation needs to wait for relational algebra.
•We can get both (1) and (2) with a 3NF decomposition.
•But we can’t always get (1) and (2) with a BCNF decomposition.▫street-city-zip is an example.
top related