Top Banner

of 50

db mod 4

Apr 07, 2018

Download

Documents

Vijay.N
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/6/2019 db mod 4

    1/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    Integrity constraints ( ref: dbms by Silbershatz and galvin)

    We know that unauthorized users can access the database. They can damage data in the database.

    Also they can make the database inconsistent. Also a normal DBMS user can make the database in an

    inconsistent state because of accident. So some restrictions should be made in the database so that the

    users do not make changes to data accidentally. These restrictions are also called constraints.

    Integrity constraints are intended for the normal user. These integrityt constraints ensure that

    changes made to the database by authorized users do not result in a loss of data consistency. So the

    integrity constraints guard against accidental damage to the database. They are a number of weays to

    specify integrity constraints.They are

    Key constraints ( primary keys, foreign keys and candidate key specification)

    Using not null

    Using check clause

    Using assertions

    Using triggers

    Using functional dependencies

    Domain constraints

    We know that an attribute has a set of possible values associated with it.

    For example in the student table

    Student ( stdid, name, marks)

    We know that the set of possible values for the attribute stdid is in the range of integers.

    For attribute name the set of possible values are a group of characters.

    For attribute marks the set of possible values are integers.

    So these integer, character, date etc.. are called standard domain types.

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    Module 4

    Database Design Design guidelines Relational database design Integrity Constraints Domain

    Constraints- Referential integrity Functional Dependency- Normalization using Functional

    Dependencies, Normal forms based on primary keys- general definitions of Second and Third Normal

    Forms. Boyce Codd Normal Form Multivalued Dependencies and Forth Normal Form Join

    Dependencies and Fifth Normal Form Pitfalls in Relational Database Design.

    1

  • 8/6/2019 db mod 4

    2/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    Declaring an attribute to be of a particular domain acts as a constraint on the values that it can take. It is

    possible for several attributes to have the same domain. For example in our student table, the domain of

    stdid is same as domain of marks. That is integer. But we never say that find the name of students whohave the same stdid as a mark. It is not meaningful.

    We can define new domains by using the create domain clause.

    That iscreate domain Dollars int ;

    create domain pounds int;

    Define the domains Dollars and pounds to be of integers. An attempt to assign a value of type dollars to a

    variable of type Pounds would result in a syntax error although both are of the same type. But they are of

    different domains.

    The check clause in SQL permits domains to be restricted in powerful ways. For example if we are

    creating a domain Studmarks and the condition is that the Studmarks value should not be more than 100.

    we can specify thgis by

    Create domain Studmarks intConstraint marktest check(value

  • 8/6/2019 db mod 4

    3/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    create table books (

    bid int,bname char(10),

    author char(10),

    primary key (bid)

    );

    suppose the students are allowed to access and reserve books. We are given that the details of all students

    are in the students table and details of all books are in the books table.

    Suppose the condition in the college is that only students of the college are allowed to access and reserve

    books. In other words we can specify this condition as only students who are having entry in the student

    table are allowed to access the books. In other words the stdid values in reserve and accessed table must

    also be present in the student table.

    Suppose another condition is that the students are allowed to access and reserve only those books that are

    present in the college library. Or in other words we can say that the students are allowed to access and

    reserve books that are present in the books table. Or in other words the bid values in the books reserve andaccessed tables must also be present in the books table .

    The above conditions or restrictions we can specify by using a foreign key clause.

    That is

    The tables accessed and reserved are created by

    Create table reserved (

    Stdid int,

    Bid int,

    Foreign key( stdid) references student( stdid),

    Foreign key( bid) references books( bid)

    );

    create table accessed (

    stdid int,

    bid int,

    Foreign key( stdid) references student( stdid),

    Foreign key( bid) references books( bid)

    );

    this means that for any tuples inserted in to the reserved table the value of stdid and bid must be presentin the student and books tables respectively.

    Also for any tuples inserted in to the accessed table the value of stdid and bid must be present in the

    student and books tables respectively.

    We can also create the tables reserved and accessed by specifying a coantraint name for these foreign

    keys. That is another way of creating the tables is

    Create table reserved (

    Stdid int,

    Bid int,Constraint st Foreign key( stdid) references student( stdid),

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    3

  • 8/6/2019 db mod 4

    4/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    Constraint bks Foreign key( bid) references books( bid)

    );

    create table accessed (

    stdid int,

    bid int,

    constraint stud Foreign key( stdid) references student( stdid),constraint bk1 Foreign key( bid) references books( bid)

    );

    here we have given names to these constraints.

    So there are 2 foreign key constraints for reserved table. They are st and bks.

    There are 2 foreign key constraints for accessed table. They are stud and bk1.

    These facts can be represented by

    StudentStdid Name marks

    Books

    Bid Bname Author

    Reserved

    Stdid Bid Rdate

    Accessed

    Stdid Bid Adate

    Then other types of constraints are primary key constraints , unique, not null, check constraints.

    For example suppose consider the table student.

    Student ( stdid, branch, sem, relation, name, marks)

    In this we can see that there are 2 candidate keys. They are stdid and (branch, sem, relation). One we

    assign as the primary key , one we assign as unique.

    Suppose we have the constraint that the name and marks of a student should not be nil or thwere should

    be a value in the marks field and also suppose that we want to ensure that the value of marks should not

    be more than 100. we can ensure this by using check clause.We can create the table by

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    4

  • 8/6/2019 db mod 4

    5/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    Create table student (

    Stdid int,

    Branch char(2),

    Sem int,

    Rn int,

    Name char(10) not null,Marks int not null,

    Primary key (stdid),

    Unique( branch, sem, Rn),

    Check (marks

  • 8/6/2019 db mod 4

    6/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    Problems in updating values

    Lossy join decomposition

    We can see an example. Suppose the information related with a college is stored as

    College (dname, dhod, dphone, stdid, stdname, stdmarks)

    College

    Dname Dhod Dphone stdid stdname smarks

    CS

    CS

    CS

    EC

    EC

    AE

    CS

    AE

    Abc

    Abc

    Abc

    Bgh

    Bgh

    Mkl

    Abc

    Mkl

    23456

    23456

    23456

    78905

    78905

    34443

    23456

    34443

    100

    101

    102

    100

    101

    100

    103

    101

    Ss1

    Ss2

    Ss3

    Ss7

    Ss8

    Ss2

    Ss4

    Ss3

    70

    20

    45

    67

    55

    68

    34

    70

    Suppose we want to add the details of a new student in to the college table.

    That is student- 800, hjk, 50 to AE department.

    In our design we need a tuple with values on all attributes of college schema. Thus we must repeat

    the dhod and dphone and we must add the tuple

    AE, bcd, 34443, 800, hjk, 50

    In general, the Dhod and Dphone for a department must appear once for each student admitted to

    that department.

    The repetition of information is very much undesirable. Repeating information wastes space. Also

    it complicates the database. Suppose the phone number of department CS changes from 23456 to 56789.

    Under this design many tuples of college relation needs to be changed. So updates are very costly in this

    design. When we perform update on this table, we must ensure that every tuple corresponding to CSdepartnment is updated. Otherwise our table will show 2 different phone number values.

    By observing this, we can say that this design of our table or database is bad.

    We know that a department has a unique value of phone number, so given a department name we can

    uniquely identify the phone number value.

    We know that a department has many students, so given a department name we cannot uniquely

    determine the stdid. In other words we can say that the functional dependency dname dphone holds on

    college schema. But we cannot say that there is a functional dependency dname stdid exists.

    The fact that the department has a particular value for phone no., and the fact that dept has a

    student are independent, these facts can be best represented in separate tables. We will see that we can usefunctional dependencies to specify formally when a database design is good.

    Another problem with the college relation is that we cannot represent directly the information

    related with a department ( dname, dhod, dphone) if there are no students in that department. This is

    because tuples in college relation requires values for stdid, stdname, stdmarks.

    One solution for this is to use null values. But these null values are difficult to handle. If we do not

    want to deal with null values, we can create department information only when the first student is

    admitted to that department. And if all students from that department go out, then we have to delete allinformation on that department. But this situation is undesirable.

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    6

  • 8/6/2019 db mod 4

    7/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    Then some other problems that can occur isupdate anomalies or problems in updates and lossy join

    decompositions.For example if we consider the student table

    Student ( stdid, branch, name, marks, hod, deptphoneno)

    Student

    Stdid Branch Name Marks Hod Deptphoneno

    100

    101

    102

    105

    Cs

    Cs

    Ec

    Ec

    Abc

    Bcd

    Sad

    Abc

    60

    70

    80

    10

    Def

    Def

    Ghj

    Ghj

    567890

    567890

    123456

    123456

    In this table we can see that there is repetition of information. Also we can see that there is a particularperson as hod for each branch. If all the students details are stored in this table we can see that if there are

    100 students in each branch the hod s name will be repeated 100 times. Also the department phone no

    will also be repeated 100 times. Suppose the hod of a particular branch changes. Then we have to update

    the hod field of each branch. If there are 100 tuples corresponding to each branch then all those tuples

    have to be updated corresponding to the hod field. This is the case with deptphoneno also. If we want to

    change the phone no of a particular department, it also has to be changed for all these tuples. This is called

    update anomalies.

    Lossy join decomposition is another pitfall in the relational database design. This has been explained withfourth normal form.

    Functional dependency ( Ref: navathe)

    This is a very important concept in the relational database design. A functional dependency is a

    constraint between 2 sets of attributes from the database. First we can see an example.

    Consider the student table.

    Student

    Stdid Sname Marks Rn Branch Sem Hod Grade

    100 Anil 50 1 Cs 3 Abc D

    101 Binil 80 2 Cs 3 Abc A

    102 Cinil 70 3 Cs 3 Abc B

    103 Dinil 80 4 Cs 3 Abc A

    We are considering the student table and our assumptions are on a real world view of the student.

    We can see that the keys or candidate keys of the table are stdid and (branch, sem, rn). We knowthat a key means for each tuple the value of the key attribute or column should be distinct. For example

    stdid, for each row or tuple in the student table, stdid value should be different. Then the key (branch,

    sem, rn). In this case also the 3 values for these three attributes taken together are distinct for each tuple or

    row. That is these groups of 3 values are distinct for each tuple or row.

    Stdid

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    7

  • 8/6/2019 db mod 4

    8/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    100

    101102

    103

    108

    branch sem rn

    cs 3 1

    cs 3 2

    cs 3 3

    cs 5 1

    cs 5 2

    ec 3 1

    ec 3 2

    ec 3 3

    ec 5 1

    ec 5 2

    we can see that the key values are distinct for each row.

    If we say

    Stdid marks

    This is called a functional dependency. That is stdid functionally determines marks.

    Suppose in the above table the values for the attributes are

    Stdid marks

    100 80

    101 85

    102 70

    103 70104 85

    108 70

    109 80

    Any way stdid values are different for each row since it is a candidate key. In this we can see that

    for each stdid value, there is a unique marks value. It means if the stdid is 102, its correspondingmarks value is always 70 in this student table. This means that the value of the marks attribute of a

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    8

  • 8/6/2019 db mod 4

    9/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    tuple in student depend on or are determined by the values of the stdid component or we can say that the

    values of the stdid component of a tuple uniquely (functionally) determines the values of the marks

    attribute. We can say that there is a functional dependency from stdid to marks or that marks isfunctionally dependent on stdid. The attribute stdid is called the left hand side of the FD and marks is

    called the right hand side.

    We can write other functional dependencies as

    Stdid sname

    Stdid rn

    Stdid sem

    Stdid branch

    Stdid hod

    Stdid grade

    Also we can write as

    Stdid sname, marks, rn, branch, sem, hod, grade

    We can see that this is correct. We have written the above sets because stdid is a key attribute.

    We can also write

    Branch, sem, rnmarks

    We can write it because the left hand side is a key attribute.

    Branch sem rn marks

    Cs 3 1 50

    Cs 3 2 60

    Cs 3 3 70

    Cs 3 4 50Ec 3 1 50

    Ec 3 2 20

    Ec 3 3 30

    On looking on to this we can say that

    (branch, sem, rn) functionally determines marks.

    Also we can write

    Branch, sem, rn stdid

    Branch, sem, rn sname

    Branch, sem, rn marks

    Branch, sem, rn hod

    Branch, sem, rn grade

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    9

  • 8/6/2019 db mod 4

    10/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    Or together

    Branch, sem, rn stdid, sname, marks, hod, grade

    Since these 2 attributes are keys for student, we have written these 2 functional dependencies.

    Stdid branch, sem, rn, sname, marks, hod, grade

    Branch, sem, rn stdid, sname, marks, hod, grade

    Of we look on to that table again, we can find other functional dependencies.

    For example

    Stdid branch hod

    100 cs abc

    101 cs abc

    103 cs abc

    104 cs abc

    101 ec bcd103 ec bcd105 cs abc

    104 ec bcd

    if we think, we can find that for each branch there is only one hod or for each value of

    branch there is a unique hod.

    We can write as

    Branch hod

    Then if we take marks and grade, suppose the mark is 80. suppose the grade is A for mark

    80 and above. We can see that whenever mark 80 comes grade will be A.

    So for each value of mark there is a unique grade.

    Stdid marks grade

    100 50 D

    101 80 A

    102 85 A

    103 50 D

    104 60 C

    105 75 B

    106 60 C

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    10

  • 8/6/2019 db mod 4

    11/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    so we can write

    marks grade

    so we can say that the following functional dependencies hold in the student relation.

    Stdid branch, sem, rn, sname, marks, hod, grade

    Branch, sem, rn stdid, sname, marks, hod, grade

    Branch hod

    Marks grade

    So in the student schema we are representing these functional dependencies as

    Student

    Stdid Sname Marks Rn Branch Sem Hod Grade

    A functional dependency (FD) denoted by X Y between 2 sets of attributes X and Y thatare subsets of R specifies a constraint on the possible tuples that can form a relation state r of R. the

    constraint is that for any two tuples t1 and t2 in r that have t1[X] = t2[X], we must also have t1[Y]=

    t2[Y].

    This means that the values of Y component of a tuple in r depend on or are determined by the

    values of the X component. . Or in other words, the values of X component of a tuple uniquely or

    functionally determine the values of the Y component.

    We are saying that there is a functional dependency from X to Y or that Y is functionallydependent on X. the abbreviation for functional dependency is FD. The set of attributes X is called left

    hand side of FD, and Y is called right hand side of FD.

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    11

  • 8/6/2019 db mod 4

    12/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    A functional dependency is a property of the relation schema R, not of a particular relation state r

    of R. So an FD cannot be automatically determined from a given relation but it must be explicitly defined

    by someone who knows the meaning or semantics of the columns of relation R.

    Inference rules for functional dependencies

    Normally when designing a table, the database designer specifies a certain set of functional

    dependencies that are applicable to the table. From these set of functional dependencies we can deduce orinfer additional functional dependencies. There are certain rules for inferring additional FDs. The set of all

    such dependencies is called closure of F and is denoted by F+.

    For example suppose that we specify the following set F of obvious FDs on a relation schema

    R ( empid, empname, dob, address, deptnumber, deptname, deptmngrid)

    The set of FDs areEmpid empname, dob, address, deptnumber

    Deptnumber deptname, deptmngrid

    We can deduce additional FDs as

    Empid deptname, deptmngrid

    Empid empid

    Deptnumberdeptname

    To determine a systematic way to infer dependencies from the given set of FDs we can use a set of

    inference rules.

    Armstrong s inference rules

    The following set of rules is well known inference rules for FDs

    1. Reflexive rule

    If Y X, then X Y

    2. Augmentation rule

    X Y we can infer XZ YZ

    3. Transitive rule

    X Y, Y Z , we can infer X Z

    4. Decomposition rule

    X YZ , we can infer X Y, X Z

    5. union rule

    X Y, X Z we can infer X YZ

    6. pseudotransitive ruleX Y, WY Z we can infer WX Z

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    12

  • 8/6/2019 db mod 4

    13/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    Trivial and non trivial functional dependencies

    In a functional dependency X Y , if Y X then it is a trivial FD.

    Otherwise it is non trivial.

    For example

    A B C

    Q

    E

    R

    Q

    T

    UR

    L

    J

    B

    L

    B

    GB

    M

    N

    Y

    J

    D

    PY

    The FD s are A B

    C A

    These two are non trivial functional dependencies.We can also write

    A, B B

    A, B, C A, C

    These are trivial functional dependencies because RHS is a subset of LHS.

    Normally database designers first specify a set of functional dependencies for a table. Then Armstrongs

    inference rules can be used to deduce additional FDs. For this purpose we can use the following

    algorithm.

    We are given a set of Fds for a relation R. we are going to find the additional Fds by finding the closure of

    X, that is right hand side of each FD.

    Determining X+, the closure of X under F

    X+ = X;Do

    OldX+ = X+;

    For each functional dependency Y Z in F do

    If Y X+ then X+ = X+ U Z

    For example

    For relation R ( eid, ename, projno, projlocation, hours)

    We are given F

    Eid ename

    Projno projname, projlocationEid, projno hours

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    13

  • 8/6/2019 db mod 4

    14/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    Using above algorithm we can find the closure sets foreach LHS of the FDs.

    eid+ = { eid, ename}

    projno+ = { projno, projname, projlocation }eid, projno + = { eid, projno, ename, projname, projlocation, hours }

    Normal forms ( ref: Dbms by Navathe)

    The normal forms or normalization process was first proposed by Codd. It takes a relation schema

    or a set of tables through a series of tests and it checks whether the database satisfies a certain normal

    form. Codd proposed 3 normal forms.

    First normal form

    Second normal form and

    Third normal form

    Then a modification to the third normal form was proposed. That is called

    Boyce Codd normal form

    All these normal forms are based on functional dependencies.

    Laterfourth normal form and fifth normal forms were proposed. They are based on multivalued

    and join dependencies.We have already studied some drawbacks or pitfalls in relational database design. The main

    drawbacks are repetition of information and inability to represent certain information. The purpose of

    normalization is to analyze the given relation schemas or tables and based on functional dependencies and

    candidate keys and remove the above said drawbacks from the database. If a relation schema or tables are

    not satisfying the normal form tests, they are decomposed and new relations are made which satisfies thenormal form tests.

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    14

  • 8/6/2019 db mod 4

    15/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    We know the concept of candidate keys and primary key of a table.

    Prime attributeAn attribute of relation schema R is called a prime attribute if it is a member of some candidate key

    of R. an attribute is called non-prime if it is not a prime attribute- that is it is not a member of some any

    candidate key.

    For example

    Student ( stdid, branch, sem, rn, sname, marks)

    Branch is a prime attribute because it is a member of the candidate key ( branch, sem, rn).

    Like wise sem is a prime attribute.

    Stdid is a prime attribute because it is itself a candidate key.

    Marks is not a prime attribute.

    Also sname is not a prime attribute.

    First normal form (1NF)

    It is defined to disallow multivalued attributes, composite attributes and their combinations.

    It states that domain of an attribute must include only atomic (indivisible) values and that value of

    any attribute in a tuple must be a single value from the domain of the attribute.

    So first normal form disallows having as set of values, tuple of values or combination of both as an

    attribute value for a single tuple.

    We can explain this using an example.

    Consider the student relation.

    Student ( stdid, sname, saddress, phoneno)

    Student

    Stdid Sname Saddress phoneno

    100

    102

    Abc

    Bcd

    No. 20, KTM, Kerala

    No. 35, EKM, Kerala

    567890

    564476

    234789

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    15

  • 8/6/2019 db mod 4

    16/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    105 Def No. 41, KTM, Kerala 123245

    367840

    300898

    In this relation we can see there are 3 tuples. But there is a composite attribute saddress having

    three fields, house no, city and state .

    Then we can see a multivalued attribute phoneno. We can see that student 102 has 2 phones. 103has 3 phones.

    According to 1NF, all these multivalued and composite attributes are not allowed.

    We have to find a way to to normalize this schema to first normal form.

    First we are solving the problem caused by multi valued attributes, here phoneno.

    We are removing the attribute phoneno and place it in a separate table or relation along with the primary

    key of student that is stdid.

    Then we get

    Student1 ( stdid, sname, saddress)

    Std_phone ( stdid, phoneno)

    Here the

    primary key of student1 is stdid and

    Primary key of std_phone is (stdid, phoneno )

    Student1

    Stdid Sname Saddress

    100

    102

    105

    Abc

    Bcd

    Def

    No. 20, KTM, Kerala

    No. 35, EKM, Kerala

    No. 41, KTM, Kerala

    Std_phone

    Stdid Phoneno

    100

    102

    102105

    105

    105

    567890

    564476

    234789123245

    367840

    300898

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    16

  • 8/6/2019 db mod 4

    17/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    Then next we have to deal with composite attributes . we can expand the saddress to 3 attributes

    as add_house, add_city, add_state. The nthe relations will be

    Student1A

    Stdid Sname Add_house Add_city Add_state

    100

    102

    105

    Abc

    Bcd

    Def

    No. 20

    No. 35

    No. 41

    KTM

    EKM

    KTM

    Kerala

    Kerala

    Kerala

    Std_phone

    Stdid Phoneno

    100

    102

    102

    105

    105

    105

    567890

    564476

    234789

    123245

    367840

    300898

    We can see that student1A and std_phone are in first normal form (1NF).

    Second normal form (2NF)

    Before seeing second normal form, we have to learn some definitions

    Partial and full functional dependencies

    A functional dependency X Y is a full functional dependency if removal of an attribute A from X (that

    is A subset of X) means that the dependency does not hold any more.

    A functional dependency X Y is a partial functional dependency, if some attribute A from X is

    removed, the dependency still holds.

    For example

    Student (stdid, branch, sem, rn, name, marks, hod)

    We know that the following FDs are correct for this table.

    FD1 -- stdid branch, sem, rn, name, marks, hod

    FD2 -- branch, sem, rn stdid, name, marks

    Also

    FD3 -- branch, sem, rn hod

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    17

  • 8/6/2019 db mod 4

    18/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    In FD2, if we remove the attribute sem from the LHS or X part, we can see the

    Branch, rn does not functionally determine stdid, name, marks, hod. This is the case if we remove branch

    and rn. So this FD2 is called a full functional dependency.In FD3, if we remove the attribute sem and rn we cn see that the FD still holds.

    That is branch hod is also a functuional dependency. So this FD3 is a partial functional dependency.

    A relation schema or a table, R is in second normal form, if every non prime attribute A in R is fully

    functionally dependent on the primary key of R.

    For example

    Student1 ( stdid, branch, sem, rn, name, hod, marks, grade )

    FD1

    FD2

    FD3

    FD4

    We can see that the student1 relation is not in second normal form, because of FD3. that is

    Branch hod

    It violates 2NF because the non prime attribute hod is partially dependent on the candidate key (branch,

    sem, rn ).This is a partial functional dependency because

    Branch, sem, rn hod. (if we remove the attribute sem, rn then also the FD holds).

    Other non prime attributes are name, marks,grade. They are fully functionally dependent on the keys.

    Stdid name

    Branch, sem, rn name

    Stdid marks

    Branch, sem, rn

    marksStdid grade

    Branch, sem, rn grade

    Grade marks does not violate 2NF, because grade is not a prime attribute.

    As a next step we have to normalize student1 to 2NF.

    We are decomposing it by

    Removing attribute hod which forms a partial dependency from student1 and put it in another relation.

    That is we are decomposing student1 to student1A and student1B

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    18

  • 8/6/2019 db mod 4

    19/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    Student1A

    Stdid Branch Sem Rn Name Marks Grade

    FD1

    FD2

    FD3

    Student1B

    Branch Hod

    So we have decomposed student1 into

    student1A (stdid, branch, sem, rn, name, marks, grade) andstuident1B ( branch, hod)

    This is in 2NF.

    Third normal form (3NF)

    3NF is based on the concept of transitive dependency. Transitive dependencies are not allowed in3NF.

    Transitive dependency means, if in a relation or a table if XY and YZ hold, then X Z is also

    a functional dependency that holds on R. Here X, Y, Z are attributes of the table and also Y should not be

    a candidate key or a subset of any key (prime attribute) of the table R.

    we can see this by an example.

    Student3

    Stdid Branch Sem Rn Name Marks Grade

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    19

  • 8/6/2019 db mod 4

    20/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    We have shown 3 FDs here. That is

    Fd1 Stdidgrade

    Fd2 Stdidmarks

    Fd3 Marks grade

    We can see that marks is not a prime attribute of student3.

    Stdid grade is a transitive dependency because of Fd2 and Fd3.

    This is not allowed in 3NF.

    A relation R is said to be in 3NF, if R is in 2NF and also no non prime attribute of R is transitively

    dependent on the key of R.

    The above relation schema student3 is in 2NF, since there are no partial dependencies on a key exists. But

    it is not in 3NF because of the transitive dependency stdid grade via e marks.

    We can normalize student3 by decomposing it in to two 3NF relation schemas,

    Student3A and student3B as follows.

    Student3A (stdid, branch, sem, rn, name, marks)

    Student3B (marks, grade)

    Student3

    Stdid Branch Sem Rn Name Marks Grade

    Student3A______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    20

  • 8/6/2019 db mod 4

    21/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    Stdid Branch Sem Rn Name Marks

    Student3B

    Marks Grade

    We can see that this is in 3NF.

    Example 2:

    Emp_dept

    Ename Ssn Bdate Address Dnumber Dname Dmgrssn

    We can see that the above schema is not in 3NF because the transitive dependency, but it is in 2NF.

    Ename dmgrssn is there. Also

    Ename dname is there. (through dnumber)

    We can decompose this in to

    ED1

    Ename Ssn Bdate Address Dnumber

    ED2

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    21

  • 8/6/2019 db mod 4

    22/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    Dnumber Dname Dmgrssn

    See that this table is in 3NF.

    General definitions of second and third normal forms

    General definition of second normal form

    A relation schema R is in 2NF, if every non prime attribute A in R is not partially dependent on

    any key of R. we can see an example.

    LOTSPropertyid Countyname Lot Area Price Taxrate

    Fd1

    Fd2

    Fd3

    Fd4

    We can see that the LOTS schema violates the general definition of 3NF because tax rate is

    partially dependent on the candidate key (county name, lot) due to FD3.

    To normalize LOTS in to 2NF, we decompose it in to 2 relations, Lots1 and Lots2. we construct

    Lots1 by removing the attribute tax rate that violates 2NF and placing it with county name (the LHS of

    FD3 that causes partial dependency) in to another relation Lots2. both Lots1 and Lots2 are in 2NF. Wecan see that FD4 does not violate 2NF.

    LOTS1

    Propertyid Countyname Lot Area Price

    Fd1

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    22

  • 8/6/2019 db mod 4

    23/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    Fd2

    Fd4

    LOTS2

    County

    name

    Tax rate

    fd3

    The relations LOTS1 and LOTS2 are in second normal form.

    General definition of third normal form (3NF)

    A relation schema R is in 3NF if whenever a non-trivial functional dependency

    X A holds in R, eithera) X is a super key of R

    OR

    b) A is a prime attribute of R.

    If any of these conditions hold we can say that the relation scema is in 3NF.

    Using this we can directly analyse a relation scheam whether it is in 3NF.

    Consider the LOTS relation.

    LOTS

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    23

  • 8/6/2019 db mod 4

    24/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    Propertyid Countyname Lot Area Price Taxrate

    Fd1

    Fd2

    Fd3

    Fd4

    According to this LOTS is not in 3NF, because FD3 and FD4 violates the conditions.

    We can see that FD1 and FD2 are in 3NF.

    But in FD3

    County name taxrate

    County name itself is not a super key and also tax rate is not a prime attribute.

    Also in FD4

    Area price

    Area is not a super key and also price is not a prime attribute.

    So LOTS is not in 3NF.

    To normalize LOTS we decompose it into LOTS2 and LOTS1A and LOTS1B.

    We construct LOTS1A by removing the attribute price that violates 3NF and LOTS2 by removing the

    attribute taxrate that also violates 3NF.

    LOTS2

    County

    name

    Tax rate

    Fd3

    LOTS1A

    Propert id Countyname Lot Area

    Fd1

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    24

  • 8/6/2019 db mod 4

    25/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    Fd2

    LOTS1B

    Area Price

    Fd4

    We can see that all the above relations LOTS2, LOTS1A, LOTS1B are in 3NF.

    A relation schema R is in 3N if every non prime attribute of R meets the following conditions .

    It is fully functionally dependent on every key of R.

    It is non transitively dependent on every key of R.

    Boyce Codd Normal form (BCNF)

    It was first proposed as a simpler form of 3NF, but it was founf to be stricter than 3NF. This isbecause every relation in BCNF is also in 3NF. However a relation in 3NF may not be in BCNF.

    A relation schema R is in BCNF if whenever a non trivial functional dependency X A holds in

    R, then X is a superkey of R. the only difference between BCNF and 3NF is that the condition (b) of 3NF

    (which allows A to be prime) is absent from BCNF.

    Suppose we have a table Lots1A

    Lots1A

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    25

  • 8/6/2019 db mod 4

    26/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    Propertyid Countyname Lot Area

    Fd1

    Fd2

    Fd5

    Here we can see that the relation Lots1A is not in BCNF, but it is in 3NF.

    FD5 violates BCNF because area is not a superkey.Fd1 and Fd2 satisfies BCNF because the LHS are

    super keys.So we remove the attribute (county name) and place it in another relation.

    Lots1AX

    Propertyid Area Lot

    Lots1AY

    Area Countyname

    These relations are in BCNF.

    Every relation in BCNF is also in 3NF. Every relation in 3NF may not necessarily be in BCNF.

    For example

    R

    A B C

    Fd1

    Fd2

    Here the relation R is in 3NF. But we can see that it is not in BCNF because C is not a super key of R.

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    26

  • 8/6/2019 db mod 4

    27/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    Exercise:

    1. consider the relation R = { A, B, C, D, E, F, G, H, I, J } and the set of functional dependencies

    A, B CA D, E

    B F

    F G, H

    D I, J

    What is the key of R?

    Decompose R in to 2NF, then 3NF relations.

    Answer

    A B C D E F G H I J

    Fd1

    Fd2

    Fd3

    Fd4

    Fd5

    From the figure, the key of R is (A, B).

    This is not in 2NF because in fd2, fd3, there is partial functional dependency. So we remove attributes D,

    E, F. but we can see

    A D

    D I

    D JSo we have to remove I, J

    B F

    F G

    F HSo we have to remove G, H.

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    27

  • 8/6/2019 db mod 4

    28/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    So we get relations 2NF

    R1

    A B C

    Fd1

    R2

    A D E I J

    Fd2

    Fd5

    R3

    B F G H

    Fd3

    Fd4

    The above relations R1, R2, R3 are in 2NF because there are no partial functional dependencies

    and also it is in 1NF.

    DECOMPOSITION TO 3NF

    We can take each of R1, R2 and R3 and analyse them

    R1A B C

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    28

  • 8/6/2019 db mod 4

    29/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    Fd1

    R1 is in 3NF because in Fd1 (A,B C), A,B is a super key.

    R2

    A D E I J

    Fd2

    Fd5

    R2 is not in 3NF because

    Fd2 ( A D,E) is in 3NF because A is a super key.

    Fd5 ( D I, J) is not in 3NF because D is not a super key and also D is not a prime attribute.

    So we remove I and J from R2.

    We decompose R2 as

    R2A

    A D E

    Fd2

    R2B

    D I J

    fd5

    R2A and R2B are in 3NF.

    Consider R3

    R3

    B F G H

    Fd3

    Fd4______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    29

  • 8/6/2019 db mod 4

    30/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    We can see fd3 satisfies 3NF because B is a super key.

    Fd4 is not in 3NF beause F is not a super key and also F is not a prime attribute.

    We decompose it into 2 relations. R3A, and R3B.

    R3A

    B F

    Fd3

    R3B

    F G H

    Fd4

    R3A and R3B are in 3NF.

    So we get the final set of relations as

    R1

    A B C

    Fd1

    R2A

    A D E

    Fd2

    R2B

    D I J

    fd5

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    30

  • 8/6/2019 db mod 4

    31/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    R3A

    B F

    Fd3

    R3B

    F G H

    Fd4

    Exercise 2:

    Given R= { A, B ,C, D, E, F, G, H, I, J}

    Functional dependencies are

    AB C

    B,D E, F

    A, D G, H

    A I

    H J

    Find the key and normalise to 2nf and then to 3nf .

    RA B C D E F G H I J

    Fd1

    Fd2

    Fd3

    Fd4

    Fd5

    The key of R is (A, B, D)

    Normalizing to 2nf

    Fd2, fd3, fd4 are not in 2NF and then fd5.We decompose it into______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    31

  • 8/6/2019 db mod 4

    32/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    R1

    A B C

    fd1

    R2

    B D E F

    Fd2

    R3 R3 R3

    Fd3

    Fd5

    R4

    A I

    Fd4

    R1, R2, R3, R4 are in 2 nf..

    Normalization to 3nf.

    Note that R3 is not in 3 nf because of fd5 ( H J)So we decompose it into

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    A D G H J

    32

  • 8/6/2019 db mod 4

    33/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    R3A and R3B.

    R3A

    Fd3

    R3B

    So we get the relations in 3nf as

    R1

    A B C

    fd1

    R2

    B D E F

    Fd2

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    A D G H

    H J

    33

  • 8/6/2019 db mod 4

    34/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    R3A

    R3A

    Fd3

    R3B

    R4

    A I

    Fd4

    Higher level normal forms

    We have already studied 4 different normal forms. That is 1NF, 2NF, 3NF and BCNF. They arebased on the concept of functional dependencies.

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    A D G H

    H J

    34

  • 8/6/2019 db mod 4

    35/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    We are going to see other kinds of dependencies. They are multivalued dependencies and join

    dependencies.

    Fourth normal form is based on the concept of multivalued dependencies.Fifth normal form is based on join dependencies.

    Multivalued dependencies and 4NF

    Suppose we are given the table named UCFX

    UCFX

    Course Faculty Textbook

    Maths

    DBMS

    JYT

    RVT

    SZ

    Grewal

    Kreyzig

    Navathe

    Silbz

    Desai

    We can see that this table is not normalized. (also not in 1NF)

    The meaning of the above table is that the specified course is taught by any of the specified teachers and

    for learning this course any of the specified text books can be used.

    So for a given course, there can exist any number of corresponding teachers and any number of

    corresponding textbooks. Also we can see teachers andd textbooks are independent of each other. It is nota matter who actually teaches any course, the same texts can be used. Let us convert this to 1NF.

    CFX

    Course Faculty Textbook

    Maths

    Maths

    Maths

    MAths

    DBMSDBMS

    DBMS

    JYT

    JYT

    RVT

    RVT

    SZSZ

    SZ

    Grewal

    Kreyzig

    Grewal

    Kreyzig

    NavatheSilbz

    Desai

    The key of the table is (course, faculty, text book)

    There is so much repetition or redundancy in this table. Also there is so much update anomalies.

    Suppose for teaching maths a new faculty comes, it is necessary to create 2 new tuples , one for each of

    the 2 text books.

    See that it is not necessary to include all faculty textbook combinations for a given course. That

    is 2 tuples are sufficient to show that Maths course has 2 faculties and 2 text books. Here the problem isthat which 2 tuples are to be taken among the 4 tuples. We cannot take a decision.

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    35

  • 8/6/2019 db mod 4

    36/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    The difficulty here is caused by the fact that faculties and textbooks are independent of one

    another. We can see that this will be improved if we decompose CFX into 2 tables.

    CF

    Course Faculty

    Maths

    Maths

    DBMS

    RVT

    JYT

    SZ

    CX

    The above relations are correct. But we can see that the

    decomposition cannot be made on the basis of functionaldependencies, because there are no functional dependencies in the

    relation.So we introduce multi valued dependencies (MVDs) in the

    relation. MVDs are a generalization of FDs, in the meaning that every FD is an MVD. But every MVD is

    not an FD.

    There are 2 MVD s in the relation CFX.

    CFX

    Course Faculty Textbook

    Maths

    Maths

    MathsMAths

    DBMS

    DBMS

    DBMS

    JYT

    JYT

    RVTRVT

    SZ

    SZ

    SZ

    Grewal

    Kreyzig

    GrewalKreyzig

    Navathe

    Silbz

    Desai

    Course Faculty

    Course Textbook

    (Double arrows are used here. Read as

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    Course Textbook

    Maths

    Maths

    DBMS

    DBMS

    DBMS

    Grewal

    Kreyzig

    Navathe

    Silbz

    Desai

    36

  • 8/6/2019 db mod 4

    37/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    course multidetermines faculty

    or faculty is multidependent on course)

    we know that a course does not have a single corresponding faculty, ie..

    functional dependency course faculty does not hold. But each course has a well defined set of

    corresponding faculties. By well defined here means that for a given course (maths) and a given text book

    (grewal) the set of faculties t (RVT, JYT) matching the pair (maths, Grewal) in CFX depends on the valueof maths alone. It makes no difference what particular value of text book we choose.

    The second MVD can also be interpreted like this.

    Definition of multi valued dependency

    Let R be a table, and let A, B, C be arbitrary subsets of the set of attributes of R.

    Then we say that B is multidependent on A , A B.

    If and only if the set of B values matching a given ( A value, C value pair) in R depends only on the A

    value and is independent of the C value.

    MVDs always go together in pairs. That is given the table R (A, B, C), the MVD

    A B holds if and only if A C also holds.So we can write as

    A B | C

    That is

    Course faculty | text book

    We said that every FD is an MVD. But every MVD is not an FD. In our CFX table, the

    problem is that if we want to insert one more faculty for maths, we have to insert 2 tuples. (because of 2

    text books). These 2 tuples are necessary to maintain our MVDs.

    Trivial multivalued dependency

    An MVD, X Y in R is said to be trivial multi valued functional dependency if a) Y

    is a subset of X or

    b) X U Y = RIf either of these conditions holds then it is a trivial MVD.

    Otherwise it is a non trivial multi valued dependency.

    For example

    CF

    Course Faculty

    Maths

    Maths

    DBMS

    RVT

    JYT

    SZ

    Here course faculty is a multi valued dependency. It is a trivial MVD because if we union both

    the attributes of this MVD we get the relation CF. (course U Faculty = CF ).

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    37

  • 8/6/2019 db mod 4

    38/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    Example 2:

    CFX

    Course Faculty Textbook

    MathsMaths

    Maths

    MAths

    DBMS

    DBMS

    DBMS

    JYTJYT

    RVT

    RVT

    SZ

    SZ

    SZ

    GrewalKreyzig

    Grewal

    Kreyzig

    Navathe

    Silbz

    Desai

    Here course faculty and course textbook are non-trivial MVDs because both the

    conditions does not hold for these MVDs in the CFX table.

    Inference rules for multivalued dependencies

    We have seen the inference rules for functional dependencies. Like that we have some rules for

    multivalued dependencies.

    Suppose R is a table. Suppose W, X, Y, Z are the columns in that table.

    Complementation rule for MVDs

    If XY, then X( R- (X U Y ) )

    Augmentation rule for MVDs

    If XY and Z W then WXYZ

    Transitive rule for MVDs

    XY, YZ then X( Z Y )

    Replication rule FD to MVD

    XY then XY

    Coalescence rule for FDs and MVDs

    If XY and there exists W with the properties that a) W Y is empty b) WZ and c) Z Y, then XZ.

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    38

  • 8/6/2019 db mod 4

    39/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    Fourth normal form

    This is based on multivalued functional dependencies.

    A relation schema R is in 4NF with respect to a set of dependencies F (that includes FDs and

    MVDs) if, for every non trivial multivalued dependency

    X Y, X is a super key of R.

    Consider the table CFX.

    CFX

    Course Faculty Textbook

    MathsMathsMaths

    MAths

    DBMS

    DBMS

    DBMS

    JYTJYTRVT

    RVT

    SZ

    SZ

    SZ

    GrewalKreyzigGrewal

    Kreyzig

    Navathe

    Silbz

    Desai

    The table or relation CFX is not in fourth normal form because

    The MVDs course textbook and

    Course

    facultyare not satisfying any of the 2 conditions of fourth normal form. Because the MVDs are non trivial MVDs

    and course is not a superkey of the table. Also in the first mvd

    course U textbook== CFX.

    In the second mvd

    Course U faculty == CFX

    We are decomposing it into CF and CX.

    CF

    Course Faculty

    Maths

    Maths

    DBMS

    RVT

    JYT

    SZ

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    39

  • 8/6/2019 db mod 4

    40/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    CX

    We can see that CF and CX are in 4NF because

    Course

    faculty is trivial in CF. (because course U faculty = CF)Course textbook is trivial in CX. (because course Utextbook = CX)

    Example 2:

    Consider the table emp

    EMP

    Ename Pname Dname

    Smith

    Smith

    Smith

    SmithBrown

    Brown

    Brown

    Brown

    Brown

    BrownBrown

    Brown

    Brown

    Brown

    Brown

    Brown

    X

    Y

    X

    YW

    X

    Y

    Z

    W

    XY

    Z

    W

    X

    Y

    Z

    John

    Anna

    Anna

    JohnJim

    Jim

    Jim

    Jim

    Joan

    JoanJoan

    Joan

    Bob

    Bob

    Bob

    Bob

    In this table brown has 4 dependents and he works on 4 deifferent projects. Smith works

    on 2 projects and has 2 dependents. We can see there are 16 tuples in this table. If we decompose emp

    table in to two tables

    Emp_projects and emp_dependents, we need to store only 11 tuples in both the tables.The emp relation is not in 4NF because the MVDs

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    Course Textbook

    Maths

    MathsDBMS

    DBMS

    DBMS

    Grewal

    KreyzigNavathe

    Silbz

    Desai

    40

  • 8/6/2019 db mod 4

    41/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    Ename pname and ename dname are not in 4NF.

    We decompose it into 2 tables.

    Emp_projects

    Ename Pname

    Smith

    Smith

    Brown

    Brown

    Brown

    Brown

    X

    Y

    W

    X

    Y

    Z

    Emp_dependents

    Ename Dname

    Smith

    Smith

    Brown

    Brown

    Brown

    Anna

    John

    Jim

    Joan

    Bob

    These 2 tables are in 4NF. This is because

    in the first table emp_projects

    enamepname is a trivial MVD. ( ename U pname = emp_projects)

    in the second table emp_dependents

    ename dname is a trivial MVD. ( ename U dname = emp_dependents )

    Lossless join decomposition

    Consider the example database

    EMP

    Ename Pname Dname

    Smith

    SmithSmith

    X

    YX

    John

    AnnaAnna

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    41

  • 8/6/2019 db mod 4

    42/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    smith Y John

    Suppose we decompose the EMP table into Emp_projects and Emp_dependents.

    Emp_projects

    Ename Pname

    Smith

    Smith

    X

    Y

    Emp_dependents

    Ename Dname

    Smith

    Smith

    John

    Anna

    Suppose we again join these tables we can see that we get the original EMP table. So this decomposition

    of EMP table in to Emp_projects and Emp_dependents is a lossless join decomposition because nothing is

    lost after a decomposition.

    Consider another table SUPPLY.

    SUPPLY

    Sname Partname Projname

    SmithSmith

    Adamsky

    Walton

    Adamsky

    Adamsky

    Smith

    BoltNut

    Bolt

    Nut

    Nail

    Bolt

    Bolt

    ProjxProjy

    Projy

    Projz

    Projx

    Projx

    Projy

    Suppose we decompose the supply table in to two that is R1 and R2.

    We get

    R1

    Sname Partname

    Smith

    Smith

    Adamsky

    Bolt

    Nut

    Bolt

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    42

  • 8/6/2019 db mod 4

    43/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    Walton

    Adamsky

    Nut

    Nail

    R2

    Sname Projname

    Smith

    Smith

    Adamsky

    Walton

    Adamsky

    Projx

    Projy

    Projy

    Projz

    projx

    If we again join these two tables R1 and R2 we will get

    Sname Partname Projname

    Smith

    SmithSmith

    Smith

    Adamsky

    Adamsky

    Adamsky

    AdamskyWalton

    Bolt

    BoltNut

    Nut

    Bolt

    Bolt

    Nail

    NailNut

    Projx

    ProjyProjx

    Projy

    Projy

    Projx

    Projy

    ProjxProjz

    We can see that the join of these tables will not give our original table supply. So this is a lossy join

    decomposition because after decomposing the Supply table we have lost some values. This we can see

    from joining the decomposed tables.

    Join dependencies and fifth normal form

    In some cases there may be no lossless join decomposition of a table R into 2 tables but there may be a

    lossless join decomposition into more than 2 tables.

    For example in the supply table

    SUPPLY

    Sname Partname Projname

    Smith

    Smith

    Adamsky

    Walton

    Adamsky

    AdamskySmith

    Bolt

    Nut

    Bolt

    Nut

    Nail

    BoltBolt

    Projx

    Projy

    Projy

    Projz

    Projx

    ProjxProjy

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    43

  • 8/6/2019 db mod 4

    44/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    If we decompose the supply table in to 3 as

    R1

    Sname Partname

    Smith

    Smith

    Adamsky

    Walton

    Adamsky

    Bolt

    Nut

    Bolt

    Nut

    Nail

    R2

    Sname Projname

    Smith

    Smith

    AdamskyWalton

    Adamsky

    Projx

    Projy

    ProjyProjz

    projx

    R3Partname Projname

    Bolt

    Nut

    Bolt

    Nut

    Nail

    Projx

    Projy

    Projy

    Projz

    projx

    Here we can see that if we again join these tables R1, R2, R3 we will get the original table. We can see

    that by joining just R1 and R2 will not get the supply table. But by joining all these 3 tables we will getthe supply table.

    SUPPLY

    Sname Partname Projname

    Smith

    Smith

    Adamsky

    WaltonAdamsky

    Bolt

    Nut

    Bolt

    NutNail

    Projx

    Projy

    Projy

    ProjzProjx

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    44

  • 8/6/2019 db mod 4

    45/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    Adamsky

    Smith

    Bolt

    Bolt

    Projx

    Projy

    See that there are no functional dependencies in this supply table. Also we can see that there are no non-

    trivial MVDs in this table that violates 4NF.

    So we are moving to another type of dependency called Join dependency.

    If a join dependency is present in a table we perform decomposition to fifth normal form. (5NF)

    Here for the supply table the join dependency is specified by

    JD (R1, R2, R3)

    This is because by joining R1 and R2 and R3 tables we will get the original table Supply.

    JD (R1, R2, R3) can also be written as

    JD( (sname, partname), (sname, projname), (partname,projname) )

    We can see that JD( R1, R2) is not valid for the supply table because on joining R1 and R2 we will not getthe Supply table.

    Example 2.

    Consider the table

    EMP

    Ename Pname Dname

    SmithSmith

    Smith

    smith

    XY

    X

    Y

    JohnAnna

    Anna

    John

    As we studied earlier had decomposed Emp table into

    Emp_proj(ename, projname) and Emp_dept(ename, dname).

    We have seen that on joining Emp_proj and Emp_dept we will get the original emp table. So we can

    specify a join dependencyJD (Emp_proj, Emp_dept)

    Trivial join dependency

    For a table R, a join dependency specified as JD(R1, R2, R3) is trivial, if any of these Ri s is

    the table R.

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    45

  • 8/6/2019 db mod 4

    46/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    Fifth normal form

    It is also called project join normal form.

    A relation schema is in fifth normal form (5NF) , if for every nontrivial join dependency

    JD( R1, R2, R3), every Ri is a superkey of R.

    For example consider the table Supply

    SUPPLY

    Sname Partname Projname

    Smith

    Smith

    Adamsky

    Walton

    AdamskyAdamsky

    Smith

    Bolt

    Nut

    Bolt

    Nut

    NailBolt

    Bolt

    Projx

    Projy

    Projy

    Projz

    ProjxProjx

    Projy

    The key of this table is (sname, partname, projname)

    We have seen that it has a join dependency

    JD { (sname,partname), (sname,projname), (partname,projname) }

    Here the projections are (sname,partname), (sname,projname) and (partname,projname).

    We can say that this table supply is not in 5NF because of this join dependencyEach of these projections do not form a superkey of supply.

    Superkey of supply is (sname,projname,partname).

    (sname,partname) is not a superkey.

    (sname,projname) is not a super key.(partname,projname) is not a superkey.

    So we have to normalise this table supply in to tables that satisfy 5NF.

    We are decomposing the table supply by considering the JD. Take each of the projections in the JD andform tables as

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    46

  • 8/6/2019 db mod 4

    47/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    R1

    Sname Partname

    Smith

    Smith

    Adamsky

    Walton

    Adamsky

    Bolt

    Nut

    Bolt

    Nut

    Nail

    R2

    Sname Projname

    Smith

    Smith

    AdamskyWalton

    Adamsky

    Projx

    Projy

    ProjyProjz

    projx

    R3Partname Projname

    Bolt

    Nut

    Bolt

    Nut

    Nail

    Projx

    Projy

    Projy

    Projz

    projx

    See that each of these R1, R2, R3 are in fifth normal form because there are no non trivial join

    dependencies in each of these tables.

    A join dependency is very difficult to detect in practice. So it is not normally applied in a database.

    Example 2: (ref: DBMS by Vipin C. Desai)

    Consider the table New_project_assignment

    New_Project_assignment

    Emp Proj Exp

    BrentBrent

    WorkstationWorkstation

    User interfaceArtificial intelligence

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    47

  • 8/6/2019 db mod 4

    48/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    Mann

    Smith

    King

    Ito

    Ito

    Smith

    Smith

    Workstation

    Workstation

    Sql2

    Sql2

    Qbe++

    Query systems

    File systems

    VLSI technology

    Operating systems

    Relational calculus

    Relational algebra

    Relational calculus

    Database systems

    Operating systems

    Here there is a join dependency in this table

    That is

    JD{ (proj, Exp), (emp,exp), (emp,proj) }

    This is because on joining these 3 projections (proj, Exp),

    (emp,exp),

    (emp,proj)

    we will get the original table New_Project_assignment.

    We can see that this JD is not a trivial join dependency. Also this table New_Project_assignment is not in

    5NF because each of the projections in the JD is not a super key of the table.

    The super key of the table is (emp, proj, exp).

    The projections in the JD (proj, exp) is not a superkey.

    (emp,exp) is not a super key.

    (emp,proj) is not a super key.

    So this table is not in 5NF.

    We are normalizing this table to Fifth normal form by decomposing the table using the projections in the

    JD.

    That is

    S1

    Project Expertise

    Work stationWork stationWork station

    Work station

    Sql 2

    Sql 2

    Qbe ++

    Query systemsFile systems

    User interfaceArtificial intelligenceVlsi technology

    Operating systems

    Relational calculus

    Relational algebra

    Relational calculus

    Database systemsOperating systems

    S2Employee Expertise

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    48

  • 8/6/2019 db mod 4

    49/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    Brent

    Brent

    Mann

    King

    Ito

    ItoSmith

    smith

    User interface

    Artificial intelligence

    Vlsi technology

    Relational calculus

    Relational algebra

    Relational calculusDatabase systems

    Operating systems

    S3

    Employee Project

    Brent

    Mann

    King

    ItoIto

    Smith

    Smith

    Smith

    Work station

    Work station

    Sql 2

    Sql 2Qbe ++

    File systems

    Query systems

    Work station

    We can see that each of these S1, S2 and S3 are in 5NF because there are no non trivial join dependencies

    in this table.

    Example 3: (ref: Dbms by C. J. Date )

    SPJS P J

    S1

    S1

    S2

    S1

    P1

    P2

    P1

    P1

    J2

    J1

    J1

    J1

    If we decompose the table into SP (S, P), PJ(P, J) and JS(J, S) and we again perform join on these 3

    tables we will get the original table SPJ( s, p, j).

    So there is a join dependency

    JD (SP, PJ, JS)

    We can say that this join dependency of table SPJ does not satisfy the 5NF because each of the projections

    SP, PJ, JS are not the super keys of table SPJ.

    So decompose SPJ into 3 tables as

    SP

    S P

    S1 P1

    ______________________________________________________________________________________________________________ Department of IT Mangalam College of Engineering, Ettumanoor

    49

  • 8/6/2019 db mod 4

    50/50

    RT503 Database Management Systems Module 4

    ______________________________________________________________________________________________________________

    S2

    S2

    P2

    P1

    PJ

    P J

    P1P2

    P1

    J2J1

    J1

    JS

    J S

    J2

    J1J1

    S1

    S1S2

    Each of these tables SP, PJ, JS are in 5NF because there are no non trivial join dependencies in these

    tables.

    50