Top Banner
Unit 2 (2010-20 1 1) © Aditya Engineerin g College Redundancy-Overview Redundancy is at the root of several problems associated with relational schemas namely,   Redundant storage: Some information is stored repeatedly.   Update anomalies: If one copy of such repeated data is updated, an inconsistency is created unless all copies are similarly updated.   Insertion anomalies: It may not be possible to store some information unless some other information is stored as well.   Deletion anomalies: It may not be possible to delete some information without losing some other information as well. Integrity constraints, in particular functional dependencies , can be used to identify schemas with such problems and to suggest refinements. Main refinement technique: decomposition (replacing ABCD with, say , AB and BCD, or ACD and ABD).
15

Dbms Unit5 Part 1

Apr 06, 2018

Download

Documents

vidyasuri
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Dbms Unit5 Part 1

8/3/2019 Dbms Unit5 Part 1

http://slidepdf.com/reader/full/dbms-unit5-part-1 1/15

Unit 2 (2010-2011) © Aditya Engineering College

Redundancy-Overview

• Redundancy  is at the root of several problems associated with

relational schemas namely, –  Redundant storage: Some information is stored repeatedly.

 –  Update anomalies: If one copy of such repeated data isupdated, an inconsistency is created unless all copies aresimilarly updated.

 – 

Insertion anomalies: It may not be possible to store someinformation unless some other information is stored as well.

 –  Deletion anomalies: It may not be possible to delete someinformation without losing some other information as well.

• Integrity constraints, in particular functional dependencies , can be

used to identify schemas with such problems and to suggestrefinements.

• Main refinement technique: decomposition  (replacing ABCD with,say, AB and BCD, or ACD and ABD).

Page 2: Dbms Unit5 Part 1

8/3/2019 Dbms Unit5 Part 1

http://slidepdf.com/reader/full/dbms-unit5-part-1 2/15

Unit 1 (2010-2011)

Functional Dependencies (FDs)

• A functional dependency X Y holds over relation R if, for every

legal instance r of R: –  t1 r, t2 r, (t1) = (t2 ) implies (t1) = (t2 )

 –  i.e., given two tuples in r , if the X values agree, then the Y valuesmust also agree. (X and Y are sets of attributes.)

 –  Put another way, if the values of X are given, we canunambiguously find the values of Y.

• An FD is a statement about all legal instances of the relation.

 –  Must be identified based on semantics of application.

 –  Given a legal instance r1 of R, we can check if it violates some FD

f , but we cannot tell if f holds over R!• K is a candidate key for R means that K R

 –  However, K R does not require K to be minimal !

   X 

   X    

Y   

Page 3: Dbms Unit5 Part 1

8/3/2019 Dbms Unit5 Part 1

http://slidepdf.com/reader/full/dbms-unit5-part-1 3/15

Unit 1 (2010-2011)

Example: Functional dependencies in arelation

Consider a relation: Hourly Emps (ssn, name, lot, rating, hourly wages, hours 

worked): The key for Hourly Emps is ssn. In addition, suppose that the hourly wages

attribute is determined by the rating attribute. That is, for a given ratingvalue, there is only one permissible hourly wages value. This Constraint isan example of a functional dependency. It leads to possible redundancy inthe relation Hourly Emps

Notation : We will denote this relation schema by listing the attributes:SNLRWH

 –  This is really the set of attributes {S,N,L,R,W,H}.

 – 

Sometimes, we will refer to all attributes of a relation by using the relationname. (e.g., Hourly_Emps for SNLRWH)

• Some FDs on Hourly_Emps:

 –  ssn is the key: S SNLRWH

 –  rating determines hrly_wages : R W

Page 4: Dbms Unit5 Part 1

8/3/2019 Dbms Unit5 Part 1

http://slidepdf.com/reader/full/dbms-unit5-part-1 4/15

Unit 1 (2010-2011)

Example (Contd.)

• Problems due to R W :

 –  Update anomaly : Canwe change W in just the1st tuple of SNLRWH?

 –  Insertion anomaly : What if wewant to insert an employee anddon’t know the hourly wage for 

his rating? –  Deletion anomaly : If we delete

all employees with rating 5, welose the information about thewage for rating 5!

S N L R W H

123-22-3666 Attishoo 48 8 10 40

231-31-5368 Smiley 22 8 10 30

131-24-3650 Smethurst 35 5 7 30

434-26-3751 Guldu 35 5 7 32

612-67-4134 Madayan 35 8 10 40

S N L R H

123-22-3666 Attishoo 48 8 40

231-31-5368 Smiley 22 8 30

131-24-3650 Smethurst 35 5 30

434-26-3751 Guldu 35 5 32

612-67-4134 Madayan 35 8 40

R W

8 10

5 7Hourly_Emps2

Wages

Page 5: Dbms Unit5 Part 1

8/3/2019 Dbms Unit5 Part 1

http://slidepdf.com/reader/full/dbms-unit5-part-1 5/15

Unit 1 (2010-2011)

Another example for FD in a relation

• Suppose that we have entity sets Parts, Suppliers, and

Departments, as well as a relationship set Contracts thatinvolves all of them. We refer to the schema for Contractsas CSDPQ . A contract with contract id C specifies that asupplier S  will supply some quantity Q  of a part P  to adepartment D .

• We might have a policy that a department purchases atmost one part from any given supplier. Thus, if there areseveral contracts between the same supplier anddepartment,we know that the same part must be involved

in all of them. This constraint is an FD, DS P .

Page 6: Dbms Unit5 Part 1

8/3/2019 Dbms Unit5 Part 1

http://slidepdf.com/reader/full/dbms-unit5-part-1 6/15

Unit 1 (2010-2011)

Use of Decompositions• Intuitively, redundancy arises when a relational schema forces an

association between attributes that is not natural.

• Functional dependencies (ICs) can be used to identify suchsituations and to suggest modifications to the schema.

• The essential idea is that many problems arising from redundancycan be addressed by replacing a relation with a collection of smallerrelations.

• Each of the smaller relations contains a subset of the attributes ofthe original relation.We refer to this process as decomposition ofthe larger relation into the smaller relations

• Unless we are careful, decomposing a relation schema can createmore problems than it solves.

Page 7: Dbms Unit5 Part 1

8/3/2019 Dbms Unit5 Part 1

http://slidepdf.com/reader/full/dbms-unit5-part-1 7/15

Unit 1 (2010-2011)

Problems Related to Decomposition

• Two important questions must be asked repeatedly:

 – 

Do we need to decompose a relation? –  What problems (if any) does a given decomposition

cause?

• To help with the first question, several normal forms have beenproposed for relations.

• If a relation schema is in one of these normal forms, we knowthat certain kinds of problems cannot arise. The normal formsbased on FDs are first normal form (1NF), second normal form (2NF), third normal form (3NF), and Boyce-Codd normal form (BCNF).

•These forms have increasingly restrictive requirements: Everyrelation in BCNF is also in 3NF, every relation in 3NF is also in2NF, and every relation in 2NF is in 1NF.

• 3NF and BCNF are important from a database designstandpoint.

Page 8: Dbms Unit5 Part 1

8/3/2019 Dbms Unit5 Part 1

http://slidepdf.com/reader/full/dbms-unit5-part-1 8/15

Unit 1 (2010-2011)

First Normal Form

A relation is in first normal form if every field contains only

atomic values, that is, not lists or sets. This requirement isimplicit in our definition of the relational model even thoughsome of the newer database systems are relaxing thisrequirement

1NF (First Normal Form)

• a relation R is in 1NF if and only if it has only single-valuedattributes (atomic values)

• EMPLOYEE (SSN, emp_id, HOURS, ENAME, ADDRESS)

If the ADDRESS column has multiple values like streetaddress, city, state, pincode all in one column,then

EMPLOYEE is not in 1NF.• One needs to write a complex query for all employees living

in a particular city.

Page 9: Dbms Unit5 Part 1

8/3/2019 Dbms Unit5 Part 1

http://slidepdf.com/reader/full/dbms-unit5-part-1 9/15

Unit 1 (2010-2011)

First Normal Form

1NF (First Normal Form) contd...

• Simlarly, if emp_id consists of dept code and another uniquenumber within the department, it is NOT in 1NF.

• It requires extra programming to extract the departmentinformation of an employee. And, the information gets encodein he application rather than in the database.

• If the code is used a primary key, it can have more problems. Ifan employee changes his department, the primary key valuehas to be changed, particularly if there is a foreign keyreferencing this table. If it is not changed, it is a wrongrepresentation of data and may even lead to errors.

• Similarly, if all the account numbers of a customer are stored inone column and / or all the customers of an account numberare stored in one column, it leads to redundant storage andcomplex queries as well.

Page 10: Dbms Unit5 Part 1

8/3/2019 Dbms Unit5 Part 1

http://slidepdf.com/reader/full/dbms-unit5-part-1 10/15

Unit 1 (2010-2011)

Second Normal Form

2NF (Second Normal Form - mainly of historical interest. )

a relation R in 2NF if and only if it is in 1NF and every nonkeycolumn depends on the whole key and not a subset of a key

• all non-prime-key attributes of R must be fully functionally dependenton a whole key(s) of the relation, not a part of the key

• no violation: single-attribute key or no non-prime-key attributes

2NF (Second Normal Form) violation: part of a key non-key

EMP_PROJ2 (SSN, PNO, HOURS, ENAME, PNAME)

SSN ENAME

PNO

PNAME SOLUTION: decompose the relation

EMP_PROJ3 (SSN, PNO, HOURS)

EMP (SSN, ENAME)

PROJ (PNO, PNAME)

Page 11: Dbms Unit5 Part 1

8/3/2019 Dbms Unit5 Part 1

http://slidepdf.com/reader/full/dbms-unit5-part-1 11/15

Unit 1 (2010-2011)

Third Normal Form (3NF)

3NF (Third Normal Form)

• a relation R in 3NF if and only if it is in 2NF and no non-keycolumn depends on any other non-key column

• all non-prime-key attributes of R must be non-transitivelyfunctionally dependent on a key of the relation

• violation: non-key non-key

3NF (Third Normal Form)

SUPPLIER (SNAME, STREET, CITY, STATE, TAX)

SNAME STREET, CITY, STATE

STATE TAX (nonkey nonkey)

SNAME STATE TAX (transitive FD)

solution: decompose the relation

SUPPLIER2 (SNAME, STREET, CITY, STATE)

TAXINFO (STATE, TAX)

Page 12: Dbms Unit5 Part 1

8/3/2019 Dbms Unit5 Part 1

http://slidepdf.com/reader/full/dbms-unit5-part-1 12/15

Unit 1 (2010-2011)

Third Normal Form (3NF)

• Relationn R with FDs F is in 3NF if, for all X A in

 –  A X (called a trivial FD), or –  X contains a key for R i.e. X is a super key, or

 –  A is part of some key for R.

• Minimality of a key is crucial in third condition above!

If R is in BCNF, obviously in 3NF.• If R is in 3NF, intuitively, a part of the key can depend on some other

part of the key. And, some redundancy is possible. It is acompromise, used when BCNF not achievable (e.g., no ``good’’ decomposition, or performance considerations).

 –  Lossless-join, dependency-preserving decomposition of R into a collection of 3NF relations always possible whereas dependency preservation may not be possible with BCNF.

Page 13: Dbms Unit5 Part 1

8/3/2019 Dbms Unit5 Part 1

http://slidepdf.com/reader/full/dbms-unit5-part-1 13/15

Unit 1 (2010-2011)

Boyce-Codd Normal Form (BCNF)

• Relation R with FDs F is in BCNF if, for all X A in

 –  A X (called a trivial FD), or –  X contains a key for R (if A is not a part of X)

• In other words, R is in BCNF if the only non-trivial FDs that hold overR are key constraints. This more restrictive than the 3NF. 3NF allowsthe FD when A is a part of another key.

• In the example below, if the above FD holds and the relation is inBCNF, X contains a key, the two tuples that agree upon the X valueshould also agree on Y value and hence, the 2 tuples must beidentical (since X is a key).

X Y Ax y1 a

x y2 ?

D i i f R l i

Page 14: Dbms Unit5 Part 1

8/3/2019 Dbms Unit5 Part 1

http://slidepdf.com/reader/full/dbms-unit5-part-1 14/15

Unit 1 (2010-2011)

Decomposition of a RelationScheme

Suppose that relation R contains attributes A1 ... An. Adecomposition  of R consists of replacing R by two or morerelations such that:

 –  Each new relation scheme contains a subset of the attributesof R (and no attributes that do not appear in R), and

 –  Every attribute of R appears as an attribute of one of the newrelations.

• Intuitively, decomposing R means we will store instances of therelation schemes produced by the decomposition, instead of

instances of R.• E.g., Can decompose SNLRWH into SNLRH and RW.

Page 15: Dbms Unit5 Part 1

8/3/2019 Dbms Unit5 Part 1

http://slidepdf.com/reader/full/dbms-unit5-part-1 15/15

Unit 1 (2010-2011)

Example Decomposition

• Decompositions should be used only when needed.

 –  SNLRWH has FDs S SNLRWH and R W

 –  Second FD causes violation of 3NF; W values repeatedlyassociated with R values. Easiest way to fix this is to create arelation RW to store these associations, and to remove W

from the main schema:• i.e., we decompose SNLRWH into SNLRH and RW

• The information to be stored consists of SNLRWH tuples. If we just store the projections of these tuples onto SNLRH and RW,are there any potential problems that we should be aware of?