Top Banner
9/10/17 1 CompSci 516 Data Intensive Computing Systems Lecture 5 Design Theory and Normalization Instructor: Sudeepa Roy 1 Duke CS, Fall 2017 CompSci 516: Database Systems Announcements HW1 deadline: Due on 09/21 (Thurs), 11:55 pm, no late days Project proposal deadline: Preliminary idea and team members due by 09/18 (Mon) by email to the instructor Proposal due on sakai by 09/25 (Mon), 11:55 pm Duke CS, Fall 2017 CompSci 516: Database Systems 2 Today Finish RC from Lecture 4 DRC More example Normalization Duke CS, Fall 2017 CompSci 516: Database Systems 3 DRC: example Find the name and age of all sailors with a rating above 7 TRC: {P | S ϵ Sailors (S.rating > 7 P.name = S.name P.age = S.age)} DRC: {<N, A> | <I, N, T, A> ϵ Sailors T > 7} Variables are now domain variables We will use use TRC both are equivalent Duke CS, Fall 2017 CompSci 516: Database Systems 4 Sailors(sid, sname, rating, age) Boats(bid, bname, color) Reserves(sid, bid, day) More Examples: RC The famous “Drinker-Beer-Bar” example! Duke CS, Fall 2017 CompSci 516: Database Systems 5 Acknowledgement: examples and slides by Profs. Balazinska and Suciu, and the [GUW] book UNDERSTAND THE DIFFERENCE IN ANSWERS FOR ALL FOUR DRINKERS Drinker Category 1 Find drinkers that frequent some bar that serves some beer they like. Likes(drinker, beer) Frequents(drinker, bar) Serves(bar, beer) 6 CompSci 516: Database Systems Duke CS, Fall 2017
10

Today DRC: example - Duke Universityx86.cs.duke.edu/.../Lecture-5-Normalization-notes.pdfDuke CS, Fall 2017 CompSci 516: Database Systems 15 Reading Material •Database normalization

May 21, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Today DRC: example - Duke Universityx86.cs.duke.edu/.../Lecture-5-Normalization-notes.pdfDuke CS, Fall 2017 CompSci 516: Database Systems 15 Reading Material •Database normalization

9/10/17

1

CompSci 516DataIntensiveComputingSystems

Lecture5DesignTheoryandNormalization

Instructor:Sudeepa Roy

1DukeCS,Fall2017 CompSci516:DatabaseSystems

Announcements• HW1deadline:

– Dueon09/21(Thurs),11:55pm,nolatedays

• Projectproposaldeadline:– Preliminaryideaandteammembersdueby09/18(Mon)byemailtotheinstructor

– Proposaldueonsakai by09/25(Mon),11:55pm

DukeCS,Fall2017 CompSci516:DatabaseSystems 2

Today

• FinishRCfromLecture4– DRC– Moreexample

• Normalization

DukeCS,Fall2017 CompSci516:DatabaseSystems 3

DRC:example

• Findthenameandageofallsailorswitharatingabove7

TRC:{P|∃ SϵSailors(S.rating >7⋀ P.name =S.name ⋀ P.age =S.age)}

DRC:{<N,A>|∃ <I,N,T,A>ϵSailors⋀ T>7}

• Variablesarenowdomainvariables• WewilluseuseTRC

– bothareequivalent

DukeCS,Fall2017 CompSci516:DatabaseSystems 4

Sailors(sid,sname,rating,age)Boats(bid,bname,color)Reserves(sid,bid,day)

MoreExamples:RC

• Thefamous“Drinker-Beer-Bar”example!

DukeCS,Fall2017 CompSci516:DatabaseSystems 5

Acknowledgement:examplesandslidesbyProfs.BalazinskaandSuciu,andthe[GUW]book

UNDERSTANDTHEDIFFERENCEINANSWERSFORALLFOURDRINKERS

DrinkerCategory1

Find drinkers that frequent some bar that serves some beer they like.

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

6CompSci516:DatabaseSystemsDukeCS,Fall2017

Page 2: Today DRC: example - Duke Universityx86.cs.duke.edu/.../Lecture-5-Normalization-notes.pdfDuke CS, Fall 2017 CompSci 516: Database Systems 15 Reading Material •Database normalization

9/10/17

2

DrinkerCategory1

Find drinkers that frequent some bar that serves some beer they like.

Q(x) = $y. $z. Frequents(x, y)∧Serves(y,z)∧Likes(x,z)

a shortcut for{x | $Y ϵ Frequents Z ϵ Serves W ϵ Likes (T.drinker = x.drinker∧ T.bar = Z.bar∧ W.beer = ……}

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

7CompSci516:DatabaseSystemsDukeCS,Fall2017

Thedifferenceisthatinthefirstone,onevariable=oneattributeinthesecondone,onevariable=onetuple(TupleRC)Bothareequivalentandfeelfreetousetheonethatisconvenienttoyou

DrinkerCategory2/3/4

Find drinkers that frequent some bar that serves some beer they like.

Find drinkers that frequent only bars that serves some beer they like.

Find drinkers that frequent only bars that serves only beer they like.

Find drinkers that frequent some bar that serves only beers they like.

Q(x) = $y. $z. Frequents(x, y)∧Serves(y,z)∧Likes(x,z)

Q(x) =

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

8DukeCS,Fall2017 CompSci 516:DatabaseSystems

Q(x) =

Q(x) =

WhyshouldwecareaboutRC• RCisdeclarative,likeSQL,andunlikeRA(whichis

operational)• Givesfoundationofdatabasequeriesinfirst-order

logic– youcannotexpressallaggregatesinRC,e.g.cardinalityofarelationorsum(possibleinextendedRAandSQL)

– stillcanexpressconditionslike“atleasttwotuples”(oranyconstant)

• RCexpressionmaybemuchsimplerthanSQLqueries– andeasiertocheckforcorrectnessthanSQL– powertouse" and Þ– then you can systematically go to a “correct” SQL

query

DukeCS,Fall2017 CompSci516:DatabaseSystems 9

FromRCtoSQL

Q(x) = $y. Likes(x, y)∧"z.(Serves(z,y) Þ Frequents(x,z))

Query: Find drinkers that like some beer (so much) that they frequent all bars that serve it

CompSci516:DatabaseSystems 10

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

DukeCS,Fall2017

FromRCtoSQL

Q(x) = $y. Likes(x, y)∧"z.(Serves(z,y) Þ Frequents(x,z))

Query: Find drinkers that like some beer so much that they frequent all bars that serve it

Step 1: Replace " with $ using de Morgan’s Laws

Q(x) = $y. Likes(x, y)∧ ¬$z.(Serves(z,y) ∧ ¬Frequents(x,z))

CompSci516:DatabaseSystems 11

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

"x P(x) same as¬$x ¬P(x)

¬(¬P∨Q) same asP∧ ¬ Q

º Q(x) = $y. Likes(x, y)∧"z.(¬ Serves(z,y) ∨ Frequents(x,z))

DukeCS,Fall2017

FromRCtoSQL

SELECT DISTINCT L.drinkerFROM Likes LWHERE not exists

(SELECT S.barFROM Serves SWHERE L.beer=S.beer

AND not exists (SELECT * FROM Frequents FWHERE F.drinker=L.drinker

AND F.bar=S.bar))

CompSci516:DatabaseSystems 12

Likes(drinker, beer)Frequents(drinker, bar)Serves(bar, beer)

DukeCS,Fall2017

Q(x) = $y. Likes(x, y) ∧¬ $z.(Serves(z,y)∧¬Frequents(x,z))

Step 2: Translate into SQL

Query: Find drinkers that like some beer so much that they frequent all bars that serve it

Wewillseea“methodicalandcorrect”translationtrough“safequeries”inDatalog

Page 3: Today DRC: example - Duke Universityx86.cs.duke.edu/.../Lecture-5-Normalization-notes.pdfDuke CS, Fall 2017 CompSci 516: Database Systems 15 Reading Material •Database normalization

9/10/17

3

Summary

• YoulearntthreequerylanguagesfortheRelationalDBmodel– SQL– RA– RC

• Allhavetheirownpurposes

• Youshouldbeabletowriteaqueryinallthreelanguagesandconvertfromonetoanother– However,youhavetobecareful,notall“valid”expressionsinonemay

beexpressedinanother– {S|¬(SϵSailors)}– infinitelymanytuples– an“unsafe”query– Morewhenwedo“Datalog”,alsoseeCh.4.4in[RG]

DukeCS,Fall2017 CompSci516:DatabaseSystems 13

Wherearewenow?

WelearntüRelationalModelandQueryLanguagesüSQL,RA,RCüPostgres(DBMS)üXML(overview)§ HW1

DukeCS,Fall2017 CompSci516:DatabaseSystems 14

Next

• DatabaseNormalization– (forgoodschemadesign)

DesignTheoryandNormalization

DukeCS,Fall2017 CompSci516:DatabaseSystems 15

ReadingMaterial

• Databasenormalization– [RG]Chapter19.1to19.5,19.6.1,19.8(overview)– [GUW]Chapter3

DukeCS,Fall2017 CompSci516:DatabaseSystems 16

Acknowledgement:• Thefollowingslideshavebeencreatedadaptingtheinstructormaterialofthe[RG]bookprovidedbytheauthorsDr.Ramakrishnan andDr.Gehrke.• SomeslideshavebeenadaptedfromslidesbyProf.JunYang

Whatwillwelearn?

• Whatgoeswrongifwehaveredundantinfoinadatabase?

• Whyandhowshouldyourefineaschema?• FunctionalDependencies– anewkindofintegrityconstraints(IC)

• NormalForms• Howtoobtainthosenormalforms

DukeCS,Fall2017 CompSci516:DatabaseSystems 17

Example

ssn (S) name(N) lot(L)

rating(R)

hourly-wage(W)

hours-worked(H)

111-11-1111 Attishoo 48 8 10 40222-22-2222 Smiley 22 8 10 30333-33-3333 Smethurst 35 5 7 30444-44-4444 Guldu 35 5 7 32555-55-5555 Madayan 35 8 10 40

DukeCS,Fall2017 CompSci516:DatabaseSystems 18

Thelistofhourlyemployeesinanorganization

• key=SSN

Page 4: Today DRC: example - Duke Universityx86.cs.duke.edu/.../Lecture-5-Normalization-notes.pdfDuke CS, Fall 2017 CompSci 516: Database Systems 15 Reading Material •Database normalization

9/10/17

4

Example

ssn (S) name(N) lot(L)

rating(R)

hourly-wage(W)

hours-worked(H)

111-11-1111 Attishoo 48 8 10 40222-22-2222 Smiley 22 8 10 30333-33-3333 Smethurst 35 5 7 30444-44-4444 Guldu 35 5 7 32555-55-5555 Madayan 35 8 10 40

DukeCS,Fall2017 CompSci516:DatabaseSystems 19

Thelistofhourlyemployeesinanorganization

• key=SSN• Supposeforagivenrating,thereisonlyonehourly_wage value• Redundancyinthetable• Whyisredundancybad?

Nullsmayormaynothelp

• Doesnothelpredundantstorageorupdateanomalies• Mayhelpinsertionanddeletionanomalies

– caninsertatuplewithnullvalueinthehourly_wage field– butcannotrecordhourly_wage foraratingunlessthereissuchan

employee(SSNcannotbenull)– samefordeletionDukeCS,Fall2017 CompSci516:DatabaseSystems 20

ssn (S) name(N) lot(L)

rating(R)

hourly-wage(W)

hours-worked(H)

111-11-1111 Attishoo 48 8 10 40222-22-2222 Smiley 22 8 10 30333-33-3333 Smethurst 35 5 7 30444-44-4444 Guldu 35 5 7 32555-55-5555 Madayan 35 8 10 40

Summary:Redundancy

Therefore,• Redundancyariseswhentheschemaforcesanassociation

betweenattributesthatis“notnatural”• Wewantschemasthatdonotpermitredundancy

– atleastidentifyschemasthatallowredundancytomakeaninformeddecision(e.g.forperformancereasons)

• Nullvaluemayormaynothelp

• Solution?– decompositionofschema

DukeCS,Fall2017 CompSci516:DatabaseSystems 21

ssn (S) name(N) lot(L)

rating(R)

hourly-wage(W)

hours-worked(H)

111-11-1111 Attishoo 48 8 10 40222-22-2222 Smiley 22 8 10 30333-33-3333 Smethurst 35 5 7 30444-44-4444 Guldu 35 5 7 32555-55-5555 Madayan 35 8 10 40

DukeCS,Fall2017 CompSci516:DatabaseSystems 22

Decomposition

ssn (S) name(N) lot(L)

rating(R)

hours-worked(H)

111-11-1111 Attishoo 48 8 40222-22-2222 Smiley 22 8 30333-33-3333 Smethurst 35 5 30444-44-4444 Guldu 35 5 32555-55-5555 Madayan 35 8 40

rating hourly_wage

8 10

5 7

Decompositionsshouldbeusedjudiciously

1. Doweneedtodecomposearelation?– Severalnormalforms– Ifarelationisnotinoneofthem,mayneedto

decomposefurther

2. Whataretheproblemswithdecomposition?

DukeCS,Fall2017 CompSci516:DatabaseSystems 23

FunctionalDependencies(FDs)• Afunctionaldependency (FD)X→ YholdsoverrelationRif,foreveryallowableinstancer ofR:– i.e.,giventwotuplesinr,iftheXvaluesagree,thentheYvaluesmustalsoagree

– XandYaresets ofattributes– t1ϵr,t2 ϵr,ΠX (t1)=ΠX (t2)impliesΠY (t1)=ΠY (t2)

DukeCS,Fall2017 CompSci516:DatabaseSystems 24

A B C Da1 b1 c1 d1

a1 b1 c1 d2

a1 b2 c2 d1

a2 b1 c3 d1

WhatisapossibleFDhere?

Page 5: Today DRC: example - Duke Universityx86.cs.duke.edu/.../Lecture-5-Normalization-notes.pdfDuke CS, Fall 2017 CompSci 516: Database Systems 15 Reading Material •Database normalization

9/10/17

5

FunctionalDependencies(FDs)

• AnFDisastatementaboutall allowablerelations– Mustbeidentifiedbasedonsemanticsofapplication– Givensomeallowableinstancer1 ofR,wecancheckifitviolates someFDf,butwecannottelliff holdsoverR

• KisacandidatekeyforRmeansthatK→R– denotingR=allattributesofRtoo– However,S →RdoesnotrequireS tobeminimal– e.g.Scanbeasuperkey

DukeCS,Fall2017 CompSci516:DatabaseSystems 25

Example

• ConsiderrelationobtainedfromHourly_Emps:– Hourly_Emps (ssn,name,lot,rating,hourly_wage,hours_worked)

• Notation:Wewilldenotea relationschemabylistingtheattributes:SNLRWH

– Basicallytheset ofattributes{S,N,L,R,W,H}– herefirstletterofeachattribute

• FDsonHourly_Emps:– ssn isthekey:S →SNLRWH– ratingdetermineshourly_wages:R→W

DukeCS,Fall2017 CompSci516:DatabaseSystems 26

Armstrong’sAxioms

• X,Y,Zaresetsofattributes

• Reflexivity:IfX⊇ Y,thenX→ Y• Augmentation:IfX→ Y,thenXZ→ YZforanyZ• Transitivity:IfX→ YandY→ Z,thenX→ Z

DukeCS,Fall2017 CompSci516:DatabaseSystems 27

A B C Da1 b1 c1 d1

a1 b1 c1 d2

a1 b2 c2 d1

a2 b1 c3 d1

ApplytheserulesonAB→Candcheck

Armstrong’sAxioms

• Thesearesound andcomplete inferencerulesforFDs– sound:thenonlygenerateFDsinF+ forF– complete:byrepeatedapplicationoftheserules,allFDsinF+willbegenerated

DukeCS,Fall2017 CompSci516:DatabaseSystems 28

• X,Y,Zaresetsofattributes

• Reflexivity:IfX⊇ Y,thenX→ Y• Augmentation:IfX→ Y,thenXZ→ YZforanyZ• Transitivity:IfX→ YandY→ Z,thenX→ Z

AdditionalRules

• FollowfromArmstrong’sAxioms

• Union:IfX→YandX→ Z,thenX→ YZ• Decomposition:IfX→ YZ,thenX→ YandX→ Z

DukeCS,Fall2017 CompSci516:DatabaseSystems 29

A B C Da1 b1 c1 d1

a1 b1 c1 d2

a2 b2 c2 d1

a2 b2 c2 d2

A→B,A→CA→BC

A→BCA→B,A→C

ClosureofasetofFDs

• GivensomeFDs,wecanusuallyinferadditionalFDs:– SSN→DEPT,andDEPT→ LOTimpliesSSN→LOT

• AnFDf isimpliedbyasetofFDsF iff holdswheneverallFDsinF hold.

• F+

=closureofFisthesetofallFDsthatareimpliedbyF

DukeCS,Fall2017 CompSci516:DatabaseSystems 30

Page 6: Today DRC: example - Duke Universityx86.cs.duke.edu/.../Lecture-5-Normalization-notes.pdfDuke CS, Fall 2017 CompSci 516: Database Systems 15 Reading Material •Database normalization

9/10/17

6

TocheckifanFDbelongstoaclosure

• ComputingtheclosureofasetofFDscanbeexpensive– Sizeofclosurecanbeexponentialin#attributes

• Typically,wejustwanttocheckifagivenFDX→ YisintheclosureofasetofFDsF

• NoneedtocomputeF+

1. ComputeattributeclosureofX(denotedX+)wrt F:– SetofallattributesAsuchthatX→AisinF+

2. CheckifYisinX+

DukeCS,Fall2017 CompSci 516:DatabaseSystems 31

ComputingAttributeClosure

Algorithm:• closure=X• Repeatuntilnochange

– ifthereisanFDU→VinFsuchthatU⊆closure,thenclosure=closure∪ V

• DoesF={A→B,B→C,CD→ E}implyA→E?– i.e,isA→EintheclosureF+?Equivalently,isEinA+?

DukeCS,Fall2017 CompSci516:DatabaseSystems 32

NormalForms

• Question:givenaschema,howtodecidewhetheranyschemarefinementisneededatall?

• Ifarelationisinacertainnormalforms,itisknownthatcertainkindsofproblemsareavoided/minimized

• Helpsusdecidewhetherdecomposingtherelationissomethingwewanttodo

DukeCS,Fall2017 CompSci516:DatabaseSystems 33

FDsplayaroleindetectingredundancy

Example• ConsiderarelationRwith3attributes,ABC

– NoFDshold:Thereisnoredundancyhere– nodecompositionneeded

– GivenA→ B:SeveraltuplescouldhavethesameAvalue,andifso,they’llallhavethesameBvalue– redundancy–decompositionmaybeneededifAisnotakey

• Intuitiveidea:– ifthereisanynon-keydependency,e.g.A→B,decompose!

DukeCS,Fall2017 CompSci516:DatabaseSystems 34

NormalForms

RisinBCNF⇒ Risin3NF⇒ Risin2NF(ahistoricalone,notcovered)⇒ Risin1NF(everyfieldhasatomicvalues)

DukeCS,Fall2017 CompSci516:DatabaseSystems 35

BCNF

3NF

2NF

1NF

Definitionsnext

Boyce-CoddNormalForm(BCNF)

• RelationRwithFDsF isinBCNF if,forallX→AinF– AϵX(calledatrivial FD),or– XcontainsakeyforR

• i.e.Xisasuperkey

DukeCS,Fall2017 CompSci516:DatabaseSystems 36

Page 7: Today DRC: example - Duke Universityx86.cs.duke.edu/.../Lecture-5-Normalization-notes.pdfDuke CS, Fall 2017 CompSci 516: Database Systems 15 Reading Material •Database normalization

9/10/17

7

ThirdNormalForm(3NF)

• RelationRwithFDsF isin3NF if,forallX→ AinF+– AϵX(calledatrivialFD),or– Xcontains akeyforR,or– Aispartof somekey forR.

• Minimality ofakeyiscrucialinthirdconditionin3NF– everyattributeispartofsomesuperkey (=setofallattributes)

• IfRisinBCNF,obviouslyin3NF• IfRisin3NF,someredundancyispossible

– whenX→ AandAispartofakey(notallowedinBCNF)

DukeCS,Fall2017 CompSci516:DatabaseSystems 37

twoconditionsforBCNF

DecompositionofaRelationSchema• ConsiderrelationRcontainsattributesA1...An

• Adecomposition ofRconsistsofreplacingRbytwoormorerelationssuchthat“noattributeislost”and“nonewattributeappears”,i.e.

– EachnewrelationschemacontainsasubsetoftheattributesofR– EveryattributeofRappearsasanattributeofoneofthenewrelations– E.g.,CandecomposeSNLRWH intoSNLRH andRW

• Whatarethepotentialproblemswithanarbitrarydecomposition?

DukeCS,Fall2017 CompSci516:DatabaseSystems 38

LosslessJoinDecompositions• DecompositionofRintoXandYislossless-join w.r.t.asetof

FDsFif,foreveryinstancer thatsatisfiesF:πX(r)⨝ πY(r)=r

DukeCS,Fall2017 CompSci516:DatabaseSystems 39

S P Ds1 p1 d1

s2 p2 d2

s3 p1 d3

• DecomposeintoSPandPD-- isthedecompositionlossless?

• HowaboutSPandSD?

LosslessdecompositionofRintoR1,R2happensif• eitherR1∩R2→R1• orR1∩ R2→R2

Algorithm:DecompositionintoBCNF

• Input:relationRwithFDsFIfX→ YviolatesBCNF,decomposeRintoR- Y andXY.RepeatuntilallnewrelationsareinBCNFw.r.t.thegivenF

• NOTE:NeedtoconsiderallpossibleFDsthatcanbeinferredfromthecurrentsetofFDs(closure),notonlythegivenones!

• Givesacollectionofrelationsthatare– inBCNF– losslessjoindecomposition– andguaranteedtoterminate

DukeCS,Fall2017 CompSci516:DatabaseSystems 40

DecompositionintoBCNF(example)

• CSJDPQV,keyC,F={JP→ C,SD→ P,J→ S}– TodealwithSD→P,decomposeintoSDP,CSJDQV.– TodealwithJ→ S,decomposeCSJDQVintoJSandCJDQV

• Note:– severaldependenciesmaycauseviolationofBCNF– Theorderinwhichwepickthemmayleadtoverydifferentsetsofrelations

– theremaybemultiplecorrectdecompositionsDukeCS,Fall2017 CompSci516:DatabaseSystems 41

BCNFdecompositionexample

42

UserJoinsGroup (uid,uname,twitterid,gid,fromDate)

uid→ uname,twitteridtwitterid→ uiduid,gid→ fromDate

Ack:SlidefromProf.JunYang

Page 8: Today DRC: example - Duke Universityx86.cs.duke.edu/.../Lecture-5-Normalization-notes.pdfDuke CS, Fall 2017 CompSci 516: Database Systems 15 Reading Material •Database normalization

9/10/17

8

Anotherexample

43

UserJoinsGroup (uid,uname,twitterid,gid,fromDate)

uid→ uname,twitteridtwitterid→ uiduid,gid→ fromDate

Ack:SlidefromProf.JunYang

Recap

• Functionaldependencies:ageneralizationofthekeyconcept

• Non-keyfunctionaldependencies:asourceofredundancy

• BCNFdecomposition:amethodforremovingredundancies– BNCFdecompositionisalosslessjoindecomposition

• BCNF:schemainthisnormalformhasnoredundancyduetoFD’s

44

BCNF=noredundancy?

• User (uid,gid,place)– Ausercanbelongtomultiplegroups– Ausercanregisterplacesshe’svisited– Groupsandplaceshavenothingtodowithother– FD’s?– BCNF?– Redundancies?

45

uid gid place

142 dps Springfield

142 dps Australia

456 abc Springfield

456 abc Morocco

456 gov Springfield

456 gov Morocco

… … …

Multivalueddependencies

• Amultivalueddependency(MVD)hastheform𝑋 ↠ 𝑌,where𝑋 and𝑌 aresetsofattributesinarelation𝑅

• 𝑋 ↠ 𝑌 meansthatwhenevertworowsin𝑅 agreeonalltheattributesof𝑋,thenwecanswaptheir𝑌 componentsandgettworowsthatarealsoin𝑅

46

𝑿 𝒀 𝒁𝑎 𝑏+ 𝑐+𝑎 𝑏- 𝑐-… … …

𝑿 𝒀 𝒁𝑎 𝑏+ 𝑐+𝑎 𝑏- 𝑐-𝑎 𝑏- 𝑐+𝑎 𝑏+ 𝑐-… … …

MVDexamples

User(uid,gid,place)• uid↠ gid• uid↠ place

– Intuition:givenuid,attributesgid andplaceare“independent”

• uid,gid↠ place– Trivial:LHS∪ RHS=allattributesof𝑅

• uid,gid↠ uid– Trivial:LHS⊇ RHS

47

CompleteMVD+FDrules

• FDreflexivity,augmentation,andtransitivity• MVDcomplementation:

If𝑋 ↠ 𝑌,then𝑋 ↠ 𝑎𝑡𝑡𝑟𝑠 𝑅 − 𝑋 − 𝑌• MVDaugmentation:

If𝑋 ↠ 𝑌 and𝑉 ⊆ 𝑊,then𝑋𝑊 ↠ 𝑌𝑉• MVDtransitivity:

If𝑋 ↠ 𝑌 and𝑌 ↠ 𝑍,then𝑋 ↠ 𝑍 − 𝑌• Replication(FDisMVD):

If𝑋 → 𝑌,then𝑋 ↠ 𝑌• Coalescence:

If𝑋 ↠ 𝑌 and𝑍 ⊆ 𝑌 andthereissome𝑊 disjointfrom𝑌 suchthat𝑊 → 𝑍,then𝑋 → 𝑍

48

Tryprovingthingsusingthese!?

Verifytheseyourself!

Page 9: Today DRC: example - Duke Universityx86.cs.duke.edu/.../Lecture-5-Normalization-notes.pdfDuke CS, Fall 2017 CompSci 516: Database Systems 15 Reading Material •Database normalization

9/10/17

9

Anelegantsolution:“chase”

• GivenasetofFD’sandMVD’s𝒟,doesanotherdependency𝑑 (FDorMVD)followfrom𝒟?

• Procedure– Startwiththepremiseof𝑑,andtreatthemas“seed”tuplesinarelation

– Applythegivendependenciesin𝒟 repeatedly• IfweapplyanFD,weinferequalityoftwosymbols• IfweapplyanMVD,weinfermoretuples

– Ifweinfertheconclusionof𝑑,wehaveaproof– Otherwise,ifnothingmorecanbeinferred,wehaveacounterexample

49

Proofbychase• In𝑅 𝐴, 𝐵, 𝐶, 𝐷 ,does𝐴 ↠ 𝐵 and𝐵 ↠ 𝐶implythat𝐴 ↠ 𝐶?

50

𝑨 𝑩 𝑪 𝑫𝑎 𝑏+ 𝑐+ 𝑑+𝑎 𝑏- 𝑐- 𝑑-

𝑨 𝑩 𝑪 𝑫𝑎 𝑏+ 𝑐- 𝑑+𝑎 𝑏- 𝑐+ 𝑑-

Have: Need:

𝑎 𝑏- 𝑐+ 𝑑+𝑎 𝑏+ 𝑐- 𝑑-

𝐴 ↠ 𝐵

𝑎 𝑏- 𝑐+ 𝑑-𝑎 𝑏- 𝑐- 𝑑+

𝐵 ↠ 𝐶

𝑎 𝑏+ 𝑐- 𝑑+𝑎 𝑏+ 𝑐+ 𝑑-

𝐵 ↠ 𝐶

AA

Anotherproofbychase• In𝑅 𝐴, 𝐵, 𝐶, 𝐷 ,does𝐴 → 𝐵 and𝐵 → 𝐶 implythat𝐴 → 𝐶?

51

𝑨 𝑩 𝑪 𝑫𝑎 𝑏+ 𝑐+ 𝑑+𝑎 𝑏- 𝑐- 𝑑-

Have: Need:𝑐+ = 𝑐-

𝐴 → 𝐵 𝑏+ = 𝑏-𝐵 → 𝐶 𝑐+ = 𝑐-

A

Ingeneral,withbothMVD’sandFD’s,chasecangeneratebothnewtuplesandnewequalities

Counterexamplebychase• In𝑅 𝐴, 𝐵, 𝐶, 𝐷 ,does𝐴 ↠ 𝐵𝐶 and𝐶𝐷 → 𝐵implythat𝐴 → 𝐵?

52

𝑨 𝑩 𝑪 𝑫𝑎 𝑏+ 𝑐+ 𝑑+𝑎 𝑏- 𝑐- 𝑑-

Have: Need:𝑏+ = 𝑏-

𝑎 𝑏- 𝑐- 𝑑+𝑎 𝑏+ 𝑐+ 𝑑-

𝐴 ↠ 𝐵𝐶

D

Counterexample!

4NF

• Arelation𝑅 isinFourthNormalForm(4NF)if– Foreverynon-trivialMVD𝑋 ↠ 𝑌 in𝑅,𝑋 isasuperkey

– Thatis,allFD’sandMVD’sfollowfrom“key→otherattributes”(i.e.,noMVD’sandnoFD’sbesideskeyfunctionaldependencies)

• 4NFisstrongerthanBCNF– BecauseeveryFDisalsoaMVD

53

4NFdecompositionalgorithm

• Finda4NFviolation– Anon-trivialMVD𝑋 ↠ 𝑌 in𝑅 where𝑋 isnot asuperkey

• Decompose𝑅 into𝑅+ and𝑅-,where– 𝑅+ hasattributes𝑋 ∪ 𝑌– 𝑅- hasattributes𝑋 ∪ 𝑍 (where𝑍 contains𝑅 attributesnotin𝑋 or𝑌)

• Repeatuntilallrelationsarein4NF

• AlmostidenticaltoBCNFdecompositionalgorithm• Anydecompositionona4NFviolationislossless

54

Page 10: Today DRC: example - Duke Universityx86.cs.duke.edu/.../Lecture-5-Normalization-notes.pdfDuke CS, Fall 2017 CompSci 516: Database Systems 15 Reading Material •Database normalization

9/10/17

10

4NFdecompositionexample

55

uid gid place

142 dps Springfield

142 dps Australia

456 abc Springfield

456 abc Morocco

456 gov Springfield

456 gov Morocco

… … …

User (uid,gid,place)4NFviolation:uid↠gid

Member(uid,gid) Visited(uid,place)4NF 4NFuid gid

142 dps

456 abc

456 gov

… …

uid place

142 Springfield

142 Australia

456 Springfield

456 Morocco

… …

Otherkindsofdependenciesandnormalforms

• Dependencypreservingdecompositions• Joindependencies• Inclusiondependencies• 5NF• Seebookifinterested(notcoveredinclass)

DukeCS,Fall2017 CompSci516:DatabaseSystems 56

Summary

• PhilosophybehindBCNF,4NF:Datashoulddependonthekey, thewholekey,andnothingbutthekey!– Youcouldhavemultiplekeysthough

• Redundancyisnotdesiredtypically– notalways,mainlyduetoperformancereasons

• Functional/multivalueddependencies– captureredundancy• Decompositions– eliminatedependencies• Normalforms

– Guaranteescertainnon-redundancy– 3NF,BCNF,and4NF

• Losslessjoin• HowtodecomposeintoBCNF,4NF• Chase

57