Top Banner
ER-CSEB 1 The Relational Data Model Origins of the Relational Model Developed by British computer scientist E.F. (Ted) Codd of IBM in a seminal paper in 1970 (A Relational Model for Large Shared Data Banks, Communications of the ACM, June 1970) Considered ingenious but impractical in 1970 Conceptually simple Computers lacked power to implement the relational model Today, microcomputers can run sophisticated relational database software Relational Model Concepts Domain: A (usually named) set/universe of atomic values, where by "atomic" we mean simply that, from the point of view of the database, each value in the domain is indivisible (i.e., cannot be broken down into component parts). Examples of domains o SSN: string of digits of length nine o Name: string of characters beginning with an upper case letter o GPA: a real number between 0.0 and 4.0 o Sex: a member of the set { female, male } o Dept_Code: a member of the set { CSE, IT, ECE, EEE, MECH, ... } These are all logical descriptions of domains. For implementation purposes, it is necessary to provide descriptions of domains in terms of concrete data types (or formats) that are provided by the DBMS (such as String, int, boolean), in a manner analogous to how programming languages have intrinsic data types. Attribute: the name of the role played by some value (coming from some domain) in the context of a relational schema. The domain of attribute A is denoted dom(A). Tuple: A tuple is a mapping from attributes to values drawn from the respective domains of those attributes. A tuple is intended to describe some entity (or relationship between entities) in the miniworld. As an example, a tuple for a PERSON entity might be { Name --> "Rama Krishna", Sex --> Male, IQ --> 786 } Relation: A (named) set of tuples all of the same form (i.e., having the same set of attributes). The term table is a loose synonym.
28

UNIT-2 DBMS

Mar 11, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: UNIT-2 DBMS

ER-CSEB 1

The Relational Data Model

Origins of the Relational Model

• Developed by British computer scientist E.F. (Ted) Codd of IBM in a seminalpaper in 1970 (A Relational Model for Large Shared Data Banks,Communications of the ACM, June 1970)

• Considered ingenious but impractical in 1970• Conceptually simple• Computers lacked power to implement the relational model• Today, microcomputers can run sophisticated relational database software

Relational Model Concepts

• Domain: A (usually named) set/universe of atomic values, where by "atomic" wemean simply that, from the point of view of the database, each value in thedomain is indivisible (i.e., cannot be broken down into component parts).

Examples of domains

o SSN: string of digits of length nineo Name: string of characters beginning with an upper case lettero GPA: a real number between 0.0 and 4.0o Sex: a member of the set { female, male }o Dept_Code: a member of the set { CSE, IT, ECE, EEE, MECH, ... }

These are all logical descriptions of domains. For implementation purposes, it isnecessary to provide descriptions of domains in terms of concrete data types (orformats) that are provided by the DBMS (such as String, int, boolean), in amanner analogous to how programming languages have intrinsic data types.

• Attribute: the name of the role played by some value (coming from somedomain) in the context of a relational schema. The domain of attribute A isdenoted dom(A).

• Tuple: A tuple is a mapping from attributes to values drawn from the respectivedomains of those attributes. A tuple is intended to describe some entity (orrelationship between entities) in the miniworld.

As an example, a tuple for a PERSON entity might be

{ Name --> "Rama Krishna", Sex --> Male, IQ --> 786 }

• Relation: A (named) set of tuples all of the same form (i.e., having the same setof attributes). The term table is a loose synonym.

Page 2: UNIT-2 DBMS

ER-CSEB 2

• Relational Schema: used for describing (the structure of) a relation. E.g., R(A1,A2, ..., An) says that R is a relation with attributes A1, ... An. The degree of arelation is the number of attributes it has, here n.

Example: STUDENT(Name, SSN, Address)

One would think that a "complete" relational schema would also specify thedomain of each attribute.

• Relational Database: A collection of relations, each one consistent with itsspecified relational schema.

Characteristics of Relations

• Ordering of Tuples: A relation is a set of tuples; hence, there is no orderassociated with them.

• Ordering of Attributes: A tuple is best viewed as a mapping from its attributes(i.e., the names we give to the roles played by the values comprising the tuple) tothe corresponding values. Hence, the order in which the attributes are listed in atable is irrelevant. (Note that, unfortunately, the set theoretic operations inrelational algebra (at least how Elmasri& Navathe define them) make implicit useof the order of the attributes. Hence, Elmasri& Navathe view attributes as beingarranged as a sequence rather than a set.)

• Values of Attributes: For a relation to be in First Normal Form, each of itsattribute domains must consist of atomic (neither composite nor multi-valued)values. Much of the theory underlying the relational model was based upon thisassumption.

• Interpretation of a Relation: Each relation can be viewed as a predicate andeach tuple an assertion that that predicate is satisfied (i.e., has value true) for thecombination of values in it. In other words, each tuple represents a fact.

Relational Model Constraints and Relational Database Schemas

Constraints on databases can be categorized as follows:

• Inherent model-based: Example: no two tuples in a relation can be duplicates(because a relation is a set of tuples)

• Schema-based: can be expressed using DDL; this kind is the focus of thissection.

• Application-based: are specific to the "business rules" of the miniworld andtypically difficult or impossible to express and enforce within the data model.Hence, it is left to application programs to enforce.

Page 3: UNIT-2 DBMS

ER-CSEB 3

Elaborating upon schema-based constraints:

Domain Constraints: Each attribute value must be either null (which is really a non-value) or drawn from the domain of that attribute.

Key Constraints: A relation is a set of tuples, and each tuple's "identity" is given bythe values of its attributes. Hence, it makes no sense for two tuples in a relation to beidentical (because then the two tuples are actually one and the same tuple). That is, notwo tuples may have the same combination of values in their attributes.

Superkey of a relation is subsets of attributes, for which no two tuples can have the samecombination of values. From the fact that no two tuples can be identical, it follows thatthe set of all attributes of a relation constitutes a superkey of that relation.

A key is a minimal superkey, i.e., a superkey such that, if we were to remove any of itsattributes, the resulting set of attributes fails to be a superkey.

Example: Suppose that we stipulate that a faculty member is uniquely identified byName and Address and also by Name and Department, but by no single one of the threeattributes mentioned. Then { Name, Address, Department } is a (non-minimal) superkeyand each of { Name, Address } and { Name, Department } is a key (i.e., minimalsuperkey).

Candidate key: any key (i.e., any minimal superkey)

Primary key: a key chosen to act as the means by which to identify tuples in a relation.Typically, one prefers a primary key to be one having as few attributes as possible.

Entity Integrity, Referential Integrity, and Foreign Keys

Entity Integrity Constraint: Entity integrity constraint states that no primary key valuecan be null.

Referential Integrity Constraint: it is specified between two relations and is used tomaintain the consistency among tuples of two relations. Referential integrity constraintstates that a tuple in one relation that refers to another relation must refer to an existingtuple in that relation. For ex, the attribute DNO of EMPLOYEE gives the departmentnumber for which each employee works; hence, its value in every EMPLOYEE tuplemust match the DNUMBER value of some tuple in the DEPARTMENT relation.

Page 4: UNIT-2 DBMS

ER-CSEB 4

Foreign Key: A set of attributes FK in relation schema R1 is a foreign key of R1 thatreferences relation R2 if it satisfies following two rules

1. The attributes in FK have the same domain(s) as the primary key attributes PK ofR2;the attributes FK are said to reference or refer to the relation R2

2. A value of FK in a tuple t1 of the current state r1(R1) either occurs as a value ofPK for some tuple t2 in the current state r2(R2) or is null. In the former case ,wehave t1[FK]=t2[PK], and we say that the tuple t1 references or refers to the tuplet2. R1 is called referencing relation and R2 is called referenced relation.

The conditions for a foreign key, given above, specify a referential integrity constraintbetween the 2 relation schemas R1 and R2.

Semantic Integrity Constraints: application-specific restrictions that are unlikely to beexpressible in DDL. Examples:

• salary of a supervisee cannot be greater than that of her/his supervisor• salary of an employee cannot be lowered

Relational Databases and Relational Database Schemas

A relational database schema is a set of schemas for its relations together with a set ofintegrity constraints.

A relational database state/instance/snapshot is a set of states of its relations such thatno integrity constraint is violated.

Update Operations and Dealing with Constraint Violations

For each of the update operations (Insert, Delete, and Update), we consider what kinds ofconstraint violations may result from applying it and how we might choose to react.

Insert:

• domain constraint violation: some attribute value is not of correct domain• entity integrity violation: key of new tuple is null• key constraint violation: key of new tuple is same as existing one• referential integrity violation: foreign key of new tuple refers to non-existent tuple

Ways of dealing with it: reject the attempt to insert! Or give user opportunity to try againwith different attribute values.

Page 5: UNIT-2 DBMS

ER-CSEB 5

Delete:

• Referential integrity violation: a tuple referring to the deleted one exists.

Three options for dealing with it:

• Reject the deletion• Attempt to cascade (or propagate) by deleting any referencing tuples (plus those

that reference them, etc., etc.)• modify the foreign key attribute values in referencing tuples to null or to some

valid value referencing a different tuple

Update:

• Key constraint violation: primary key is changed so as to become same as anothertuple's

• referential integrity violation:o foreign key is changed and new one refers to nonexistent tupleo primary key is changed and now other tuples that had referred to this one

violate the constraint

Page 6: UNIT-2 DBMS

ER-CSEB 6

Relational Algebra

A brief introduction• Relational algebra and relational calculus are formal languages associated with

the relational model.

• Informally, relational algebra is a (high-level) procedural language and relationalcalculus a non-procedural language.

• However, formally both are equivalent to one another.

• A language that produces a relation that can be derived using relational calculus isrelationally complete.

• Relational algebra operations work on one or more relations to define anotherrelation without changing the original relations.

• Both operands and results are relations, so output from one operation can becomeinput to another operation.

• Allows expressions to be nested, just as in arithmetic. This property is calledclosure.

• Relational algebra is the basic set of operations for the relational model• These operations enable a user to specify basic retrieval requests (or queries)• The result of an operation is a new relation, which may have been formed from

one or more input relationso This property makes the algebra “closed” (all objects in relational algebra

are relations)• The algebra operations thus produce new relations

o These can be further manipulated using operations of the same algebra• A sequence of relational algebra operations forms a relational algebra

expressiono The result of a relational algebra expression is also a relation that

represents the result of a database query (or retrieval request)

Page 7: UNIT-2 DBMS

ER-CSEB 7

Relational Algebra consists of several groups of operations

Unary Relational Operations• SELECT (symbol: σ (sigma))• PROJECT (symbol: π (pi))• RENAME (symbol: ρ (rho))

Relational Algebra Operations from Set Theory• UNION ( ∪ )• INTERSECTION ( ∩ )• DIFFERENCE (or MINUS, – )• CARTESIAN PRODUCT ( x )

Binary Relational Operations• JOIN (several variations of JOIN exist)• DIVISION

Additional Relational Operations• OUTER JOINS, OUTER UNION• AGGREGATE FUNCTIONS

Relational Algebra Operations:

Unary Relational Operations: SELECT

• The SELECT operation (denoted by σ (sigma)) is used to select a subset of thetuples from a relation based on a selection condition.

o The selection condition acts as a filtero Keeps only those tuples that satisfy the qualifying conditiono Tuples satisfying the condition are selected whereas the other tuples are

discarded (filtered out)• Examples:

Select the EMPLOYEE tuples whose department number is 4:σ DNO = 4 (EMPLOYEE)

Select the employee tuples whose salary is greater than $30,000:σ SALARY > 30000 (EMPLOYEE)

In general, the select operation is denoted byσ <selection condition>(R) where§ the symbol σ (sigma) is used to denote the select operator§ the selection condition is a Boolean (conditional) expression

specified on the attributes of relation R§ tuples that make the condition true are selected

(appear in the result of the operation)§ tuples that make the condition false are filtered out

(discarded from the result of the operation)

Page 8: UNIT-2 DBMS

ER-CSEB 8

SELECT Operation Properties

o The SELECT operation σ <selection condition>(R) produces arelation S that has the same schema (same attributes) as R

o SELECT is commutative:σ<condition1> (σ<condition2>(R)) = σ<condition2> (σ<condition1> (R))

o Because of commutative property, a cascade (sequence) of SELECToperations may be applied in any order:

σ<cond1> (σ<cond2> (σ<cond3>(R)) = σ<cond2> (σ<cond3> (σ<cond1>(R)))

o A cascade of SELECT operations may be replaced by a singleselection with a conjunction of all the conditions:

σ<cond1>(σ<cond2>(σ<cond3>(R)) = σ<cond1>AND<cond2>AND<cond3>(R)))

o The number of tuples in the result of a SELECT is less than (or equalto) the number of tuples in the input relation R

Unary Relational Operations: PROJECT

o PROJECT Operation is denoted byπ (pi)o This operation keeps certain columns (attributes) from a relation and discards the

other columns.o PROJECT creates a vertical partitioning

§ The list of specified columns (attributes) is kept in each tuple§ The other attributes in each tuple are discarded

o Example: To list each employee’s first and last name and salary, the following isused:

πLNAME, FNAME, SALARY (EMPLOYEE)o The general form of the project operation is:π<attribute list>(R)

o π (pi) is the symbol used to represent the project operationo <attribute list> is the desired list of attributes from relation R.

o The project operation removes any duplicate tuples

Page 9: UNIT-2 DBMS

ER-CSEB 9

PROJECT Operation Properties

o The number of tuples in the result of projection π<list>(R) is always lessor equal to the number of tuples in R

§ If the list of attributes includes a key of R, then the number oftuples in the result of PROJECT is equal to the number of tuples inR

o PROJECT is not commutativeπ <list1> (π <list2> (R) ) = π <list1> (R) as long as <list2> contains the attributes in<list1>

Unary Relational Operations: RENAME

o The RENAME operator is denoted by ρ (rho)o In some cases, we may want to rename the attributes of a relation or the relation

name or botho Useful when a query requires multiple operationso Necessary in some cases (see JOIN operation later)

o The general RENAME operation ρ can be expressed by any of the followingforms:

o ρ S (B1, B2, …, Bn ) (R) changes both:§ the relation name to S, and§ the column (attribute) names to B1, B2, …..Bn

o ρ S (R) changes:§ the relation name only to S

o ρ (B1, B2, …, Bn ) (R) changes:§ the column (attribute) names only to B1, B2, …..Bn

o For convenience, we also use a shorthand for renaming attributes in anintermediate relation:

o If we write:§ RESULT ← π FNAME, LNAME, SALARY (DEP5_EMPS)§ RESULT will have the same attribute names as DEP5_EMPS

(same attributes as EMPLOYEE)o If we write:

RESULT (F, M, L, S, B, A, SX, SAL, SU, DNO) ←

ρRESULT (F.M.L.S.B,A,SX,SAL,SU, DNO)(DEP5_EMPS)The 10 attributes of DEP5_EMPS are renamed to F, M, L, S, B, A, SX, SAL, SU, DNO,respectively

Page 10: UNIT-2 DBMS

ER-CSEB 10

Relational Algebra Operations from Set Theory

UNION Operation

o Binary operation, denoted by ∪o The result of R ∪ S, is a relation that includes all tuples that are either in R

or in S or in both R and So Duplicate tuples are eliminatedo The two operand relations R and S must be “type compatible” (or UNION

compatible)§ R and S must have same number of attributes§ Each pair of corresponding attributes must be type compatible

(have same or compatible domains

Example:To retrieve the social security numbers of all employees who either work in department 5(RESULT1 below) or directly supervise an employee who works in department 5(RESULT2 below) We can use the UNION operation as follows:

DEP5_EMPS ← σDNO=5 (EMPLOYEE)RESULT1 ← π SSN (DEP5_EMPS)RESULT2(SSN) ← π SUPERSSN (DEP5_EMPS)RESULT ← RESULT1 ∪ RESULT2

The union operation produces the tuples that are in either RESULT1 or RESULT2 orboth

o Type Compatibility of operands is required for the binary set operation UNION∪ , (also for INTERSECTION ∩ , and SET DIFFERENCE –)

o R1(A1, A2, ..., An) and R2(B1, B2, ..., Bn) are type compatible if:o they have the same number of attributes, ando the domains of corresponding attributes are type compatible (i.e.

dom(Ai)=dom(Bi) for i=1, 2, ..., n).o The resulting relation for R1∪R2 (also for R1∩R2, or R1–R2, see next slides) has

the same attribute names as the first operand relation R1 (by convention)

Page 11: UNIT-2 DBMS

ER-CSEB 11

INTERSECTION Operation

o INTERSECTION is denoted by ∩o The result of the operation R ∩ S, is a relation that includes all tuples that are in

both R and So The attribute names in the result will be the same as the attribute names in Ro The two operand relations R and S must be “type compatible”

DIFFERENCE Operation

o SET DIFFERENCE (also called MINUS or EXCEPT) is denoted by –o The result of R – S, is a relation that includes all tuples that are in R but not in So The attribute names in the result will be the same as the attribute names in Ro The two operand relations R and S must be “type compatible”

Some properties of UNION, INTERSECT, and DIFFERENCE

o Notice that both union and intersection are commutative operations; that iso R ∪ S = S ∪ R, and R ∩ S = S ∩ R

o Both union and intersection can be treated as n-ary operations applicable to anynumber of relations as both are associative operations; that is

o R ∪ (S ∪ T) = (R ∪ S) ∪ To (R ∩ S) ∩ T = R ∩ (S ∩ T)

o The minus operation is not commutative; that is, in generalo R – S S – R

CARTESIAN PRODUCT Operation

o This operation is used to combine tuples from two relations in a combinatorialfashion.

o Denoted by R(A1, A2, . . ., An) X S(B1, B2, . . ., Bm)o Result is a relation Q with degree n + m attributes:

Q(A1, A2, . . ., An, B1, B2, . . ., Bm), in that order.o The resulting relation state has one tuple for each combination of tuples—one

from R and one from S.o Hence, if R has nR tuples (denoted as |R| = nR ), and S has nS tuples, then R x S

will have nR * nS tuples.o The two operands do NOT have to be "type compatibleo Generally, CROSS PRODUCT is not a meaningful operationo Can become meaningful when followed by other operations

Page 12: UNIT-2 DBMS

ER-CSEB 12

Example (not meaningful):

FEMALE_EMPS ← σ SEX=’F’(EMPLOYEE)EMPNAMES ← π FNAME, LNAME, SSN (FEMALE_EMPS)EMP_DEPENDENTS ← EMPNAMES X DEPENDENT

EMP_DEPENDENTS will contain every combination of EMPNAMES andDEPENDENT whether or not they are actually related

o To keep only combinations where the DEPENDENT is related to theEMPLOYEE, we add a SELECT operation as follows

Example (meaningful):

FEMALE_EMPS ← σ SEX=’F’(EMPLOYEE)EMPNAMES ← π FNAME, LNAME, SSN (FEMALE_EMPS)EMP_DEPENDENTS ← EMPNAMES X DEPENDENT

ACTUAL_DEPS ← σ SSN=ESSN (EMP_DEPENDENTS)RESULT ← π FNAME, LNAME, DEPENDENT_NAME (ACTUAL_DEPS)

RESULT will now contain the name of female employees and their dependents

JOIN Operation

o The sequence of CARTESIAN PRODECT followed by SELECT is used quitecommonly to identify and select related tuples from two relations

o A special operation, called JOIN combines this sequence into a single operationo This operation is very important for any relational database with more than a

single relation, because it allows us combine related tuples from various relationso The general form of a join operation on two relations R(A1, A2, . . ., An) and

S(B1, B2, . . ., Bm) is:

R <join condition>Swhere R and S can be any relations that result from general relational algebraexpressions.

Example: Suppose that we want to retrieve the name of the manager of each department.o To get the manager’s name, we need to combine each DEPARTMENT tuple with

the EMPLOYEE tuple whose SSN value matches the MGRSSN value in thedepartment tuple.

o We do this by using the join operation.

DEPT_MGR ← DEPARTMENT MGRSSN=SSN EMPLOYEE

Page 13: UNIT-2 DBMS

ER-CSEB 13

o MGRSSN=SSN is the join conditiono Combines each department record with the employee who manages the

departmento The join condition can also be specified as DEPARTMENT.MGRSSN=

EMPLOYEE.SSNConsider the following JOIN operation:

R(A1, A2, . . ., An) R.Ai=S.Bj S(B1, B2, . . ., Bm)

o Result is a relation Q with degree n + m attributes:o Q(A1, A2, . . ., An, B1, B2, . . ., Bm), in that order.o The resulting relation state has one tuple for each combination of tuples—

r from R and s from S, but only if they satisfy the join conditionr[Ai]=s[Bj]

o Hence, if R has nR tuples, and S has nS tuples, then the join result willgenerally have less than nR * nS tuples.

o Only related tuples (based on the join condition) will appear

Some properties of JOIN

o The general case of JOIN operation is called a Theta-join:o The join condition is called thetao Theta can be any general Boolean expression on the attributes of R and S; for

example:R.Ai<S.Bj AND (R.Ak=S.Bl OR R.Ap<S.Bq)

o Most join conditions involve one or more equality conditions “AND”ed together;for example:

R.Ai=S.Bj AND R.Ak=S.Bl AND R.Ap=S.Bq

EQUIJOIN Operation

o The most common use of join involves join conditions with equality comparisonsonly

o Such a join, where the only comparison operator used is =, is called anEQUIJOIN.

o In the result of an EQUIJOIN we always have one or more pairs ofattributes (whose names need not be identical) that have identical valuesin every tuple.

o The JOIN seen in the previous example was an EQUIJOIN.

Page 14: UNIT-2 DBMS

ER-CSEB 14

NATURAL JOIN Operation

o Another variation of JOIN called NATURAL JOIN — denoted by * — wascreated to get rid of the second (superfluous) attribute in an EQUIJOIN condition.

o because one of each pair of attributes with identical values is superfluouso The standard definition of natural join requires that the two join attributes, or each

pair of corresponding join attributes, have the same name in both relationso If this is not the case, a renaming operation is applied first

Example: To apply a natural join on the DNUMBER attributes of DEPARTMENT andDEPT_LOCATIONS, it is sufficient to write:

DEPT_LOCS ← DEPARTMENT * DEPT_LOCATIONS

Only attribute with the same name is DNUMBERAn implicit join condition is created based on this attribute:

DEPARTMENT.DNUMBER=DEPT_LOCATIONS.DNUMBER

Another example:

Q ← R(A,B,C,D) * S(C,D,E)

o The implicit join condition includes each pair of attributes with the samename, “AND”ed together:

o R.C=S.C AND R.D.S.Do Result keeps only one attribute of each such pair:o Q(A,B,C,D,E)

DIVISION Operation

o The division operation is applied to two relations

o R(Z)÷ S(X), where X is a subset of Z. Let Y = Z - X that is, let Y be theset of attributes of R that are not attributes of S.

o The result of DIVISION is a relation T(Y) that includes a tuple t if tuplestR appear in R with tR [Y] = t, and with

o tR [X] = ts for every tuple ts in S.

o For a tuple t to appear in the result T of the DIVISION, the values in tmust appear in R in combination

Page 15: UNIT-2 DBMS

ER-CSEB 15

Additional Relational Operations: Aggregate Functions and Grouping

o A type of request that cannot be expressed in the basic relational algebra is tospecify mathematical aggregate functions on collections of values from thedatabase.

o Examples of such functions include retrieving the average or total salary of allemployees or the total number of employee tuples.

o These functions are used in simple statistical queries that summarize informationfrom the database tuples.

o Common functions applied to collections of numeric values include

SUM, AVERAGE, MAXIMUM, and MINIMUM.

o The COUNT function is used for counting tuples or values.o Use of the Aggregate Functional operation

o MAX Salary (EMPLOYEE) retrieves the maximum salary valuefrom the EMPLOYEE relation

o MIN Salary (EMPLOYEE) retrieves the minimum Salary valuefrom the EMPLOYEE relation

o SUM Salary (EMPLOYEE) retrieves the sum of the Salary from theEMPLOYEE relation

Page 16: UNIT-2 DBMS

ER-CSEB 16

o COUNT SSN, AVERAGE Salary (EMPLOYEE) computes the count(number) of employees and their average salary

§ Note: count just counts the number of rows, without removingduplicates

o The previous examples all summarized one or more attributes for a set of tupleso Grouping can be combined with Aggregate Functions

Example: For each department, retrieve the DNO, COUNT SSN, and AVERAGESALARYo A variation of aggregate operation allows this:

o Grouping attribute placed to left of symbolo Aggregate functions to right of symbol

DNO COUNT SSN, AVERAGE Salary (EMPLOYEE)

Above operation groups employees by DNO (department number) and computes thecount of employees and average salary per department

The OUTER JOIN Operation

o In NATURAL JOIN and EQUIJOIN, tuples without a matching (or related) tupleare eliminated from the join result

o Tuples with null in the join attributes are also eliminatedo This amounts to loss of information.

o A set of operations, called OUTER joins, can be used when we want to keep allthe tuples in R, or all those in S, or all those in both relations in the result of thejoin, regardless of whether or not they have matching tuples in the other relation.

o The left outer join operation keeps every tuple in the first or left relation R inR S; if no matching tuple is found in S, then the attributes of S in the join

result are filled or “padded” with null values.o A similar operation, right outer join, keeps every tuple in the second or right

relation S in the result of R S.

o A third operation, full outer join, denoted by keeps all tuples in both theleft and the right relations when no matching tuples are found, padding them withnull values as needed.

Page 17: UNIT-2 DBMS

ER-CSEB 17

OUTER UNION Operation

o The outer union operation was developed to take the union of tuples fromtwo relations if the relations are not type compatible.

o This operation will take the union of tuples in two relations R(X, Y) andS(X, Z) that are partially compatible, meaning that only some of theirattributes, say X, are type compatible.

o The attributes that are type compatible are represented only once in theresult, and those attributes that are not type compatible from either relationare also kept in the result relation T(X, Y, Z).

o Example: An outer union can be applied to two relations whose schemas areSTUDENT(Name, SSN, Department, Advisor) and INSTRUCTOR(Name, SSN,Department, Rank).

o Tuples from the two relations are matched based on having the samecombination of values of the shared attributes— Name, SSN, Department.

o If a student is also an instructor, both Advisor and Rank will have a value;otherwise, one of these two attributes will be null.

o The result relation STUDENT_OR_INSTRUCTOR will have thefollowing attributes:

STUDENT_OR_INSTRUCTOR (Name, SSN, Department, Advisor, Rank)

Page 18: UNIT-2 DBMS

ER-CSEB 18

Page 19: UNIT-2 DBMS

ER-CSEB 19

Relational calculus

Relational algebra and calculus are equivalent in their expressive power.

Relational algebra provides a collection of explicit operations - join, union, projection,etc.The operations are used to tell the system how to build some desired relation in terms ofother relations.

The calculus merely provides a notation for formulating the definition of that desiredrelation in terms of those given relations.

Relation Algebra vs. Relational Calculus

Relational Algebra is procedural; it is more like a programming language;

Relational Calculus is nonprocedural. it is more close to a natural language.

For example, suppose you want to query:

Get supplier numbers for suppliers who supply part P2.

An algebraic version of this query might follow these steps:

1. Form the natural join of relations S and SP on S#;2. Next, restrict the result of that join to tuples for part P2;3. Finally, project the result of that restriction on S#.

A calculus formulation might look like:

Get S# for suppliers such that there exists a shipment SP with the same S# value andwith P# value P2.

The calculus formation is descriptive while the algebraic one is prescriptive.

Page 20: UNIT-2 DBMS

ER-CSEB 20

Why it is called relational calculus?It is founded on a branch of mathematical logic called the predicate calculus.

Codd proposed the concept of a relational calculus (applied predicate calculus tailored torelational databases).

o A relational calculus expression creates a new relation, which is specified interms of variables that range over rows of the stored database relations (in tuplecalculus) or over columns of the stored relations (in domain calculus).

o In a calculus expression, there is no order of operations to specify how to retrievethe query result—a calculus expression specifies only what information the resultshould contain.

o This is the main distinguishing feature between relational algebra andrelational calculus.

o Relational calculus is considered to be a nonprocedural or declarative language.o This differs from relational algebra, where we must write a sequence of

operations to specify a retrieval request; hence relational algebra can beconsidered as a procedural way of stating a query.

The Tuple Relational Calculus

1. The tuple relational calculus is a nonprocedural language. (The relational algebrawas procedural.)

We must provide a formal description of the information desired.

2. A query in the tuple relational calculus is expressed as

i.e. the set of tuples for which predicate is true.

3. We also use the notationo to indicate the value of tuple on attribute .

Page 21: UNIT-2 DBMS

ER-CSEB 21

o to show that tuple is in relation .

Example: To find the first and last names of all employees whose salary is above$50,000, we can write the following tuple calculus expression:

o {t.FNAME, t.LNAME | EMPLOYEE(t) AND t.SALARY>50000}o The condition EMPLOYEE(t) specifies that the range relation of tuple variable t

is EMPLOYEE.o The first and last name (PROJECTION πFNAME, LNAME) of each

EMPLOYEE tuple t that satisfies the condition t.SALARY>50000 (SELECTIONσ SALARY >50000) will be retrieved.

The Existential and Universal Quantifiers

o Two special symbols called quantifiers can appear in formulas; these are theuniversal quantifier )∀ ( and the existential quantifier ).∃(

o Informally, a tuple variable t is bound if it is quantified, meaning that it appears inan ∀ ( t) or ∃( t) clause; otherwise, it is free.

o If F is a formula, then so are ∃( t)(F) and ∀ ( t)(F), where t is a tuple variable.o The formula ∃( t)(F) is true if the formula F evaluates to true for some (at

least one) tuple assigned to free occurrences of t in F; otherwise ∃( t)(F) isfalse.

o The formula ∀ ( t)(F) is true if the formula F evaluates to true for everytuple (in the universe) assigned to free occurrences of t in F; otherwise ∀ (t)(F) is false.

o ∀ is called the universal or “for all” quantifier because every tuple in “theuniverse of” tuples must make F true to make the quantified formula true.

o ∃ is called the existential or “there exists” quantifier because any tuple that existsin “the universe of” tuples may make F true to make the quantified formula true.

Examples

o Find the names of employees who work on all the projects controlled bydepartment number 5. The query can be:

o {e.LNAME, e.FNAME | EMPLOYEE(e) and ∀ ( ( x)(not(PROJECT(x)) ornot(x.DNUM=5) OR

∃( ( w)(WORKS_ON(w) and w.ESSN=e.SSN and x.PNUMBER=w.PNO))))}

o Exclude from the universal quantification all tuples that we are not interested inby making the condition true for all such tuples.

o The first tuples to exclude (by making them evaluate automatically to true) arethose that are not in the relation R of interest.

o In query above, using the expression not(PROJECT(x)) inside the universallyquantified formula evaluates to true all tuples x that are not in the PROJECT

Page 22: UNIT-2 DBMS

ER-CSEB 22

relation. Then we exclude the tuples we are not interested in from R itself. Theexpression not(x.DNUM=5) evaluates to true all tuples x that are in the projectrelation but are not controlled by department 5.

o Finally, we specify a condition that must hold on all the remaining tuples in R.

∃( ( w)(WORKS_ON(w) and w.ESSN=e.SSN and x.PNUMBER=w.PNO)

The Domain Relational Calculus

The domain-oriented calculus differs from the tuple-oriented relational calculus in that ithas domain variables instead of tuple variables. That is variables that range over domainsinstead of over relations.

o An expression of the domain calculus is of the form

{ x1, x2, . . ., xn |

COND(x1, x2, . . ., xn, xn+1, xn+2, . . ., xn+m)}

o where x1, x2, . . ., xn, xn+1, xn+2, . . ., xn+m are domain variables thatrange over domains (of attributes)

o and COND is a condition or formula of the domain relational calculus.

Examples

o Retrieve the birthdate and address of the employee whose name is ‘JohnB. Smith’.

o Query :

{uv | (∃ q) (∃ r) (∃ s) (∃ t) (∃w) (∃ x) (∃ y) (∃ z)

(EMPLOYEE(qrstuvwxyz) and q=’John’ and r=’B’ and s=’Smith’)}

o Abbreviated notation EMPLOYEE(qrstuvwxyz) uses the variables withoutthe separating commas: EMPLOYEE(q,r,s,t,u,v,w,x,y,z)

o Ten variables for the employee relation are needed, one to range over thedomain of each attribute in order. Of the ten variables q, r, s, . . ., z, only uand v are free.

o Specify the requested attributes, BDATE and ADDRESS, by the freedomain variables u for BDATE and v for ADDRESS.

o Specify the condition for selecting a tuple following the bar ( | )—namely,that the sequence of values assigned to the variables qrstuvwxyz be a tupleof the employee relation and that the values for q (FNAME), r (MINIT),and s (LNAME) be ‘John’, ‘B’, and ‘Smith’, respectively.

Page 23: UNIT-2 DBMS

ER-CSEB 23

ER-to-Relational Mapping Algorithm

Step 1: Mapping of Regular Entity TypesStep 2: Mapping of Weak Entity TypesStep 3: Mapping of Binary 1:1 Relation TypesStep 4: Mapping of Binary 1:N Relationship Types.Step 5: Mapping of Binary M:N Relationship Types.Step 6: Mapping of Multivalued attributes.Step 7: Mapping of N-ary Relationship Types

Step 1: Mapping of Regular Entity Types.

– For each regular (strong) entity type E in the ER schema, create arelation R that includes all the simple attributes of E.

– Choose one of the key attributes of E as the primary key for R. If thechosen key of E is composite, the set of simple attributes that form it willtogether form the primary key of R.

Example: We create the relations EMPLOYEE, DEPARTMENT, and PROJECTin the relational schema corresponding to the regular entities in the ER diagram. SSN,DNUMBER, and PNUMBER are the primary keys for the relations EMPLOYEE,DEPARTMENT, and PROJECT as shown.

Step 2: Mapping of Weak Entity Types

– For each weak entity type W in the ER schema with owner entity type E,create a relation R and include all simple attributes (or simple componentsof composite attributes) of W as attributes of R.

– In addition, include as foreign key attributes of R the primary keyattribute(s) of the relation(s) that correspond to the owner entity type(s).

– The primary key of R is the combination of the primary key(s) of theowner(s) and the partial key of the weak entity type W, if any.

Example: Create the relation DEPENDENT in this step to correspond to the weakentity type DEPENDENT. Include the primary key SSN of the EMPLOYEE relation as aforeign key attribute of DEPENDENT (renamed to ESSN).

The primary key of the DEPENDENT relation is the combination {ESSN,DEPENDENT_NAME} because DEPENDENT_NAME is the partial key ofDEPENDENT.

Page 24: UNIT-2 DBMS

ER-CSEB 24

Page 25: UNIT-2 DBMS

ER-CSEB 25

Step 3: Mapping of Binary 1:1 Relation Types

For each binary 1:1 relationship type R in the ER schema, identify the relationsS and T that correspond to the entity types participating in R. There are three possibleapproaches:

(1) Foreign Key approach: Choose one of the relations-S, say-and include a foreignkey in S the primary key of T. It is better to choose an entity type with total participationin R in the role of S.

Example: 1:1 relation MANAGES is mapped by choosing the participating entitytype DEPARTMENT to serve in the role of S, because its participation in theMANAGES relationship type is total.

(2) Merged relation option: An alternate mapping of a 1:1 relationship type is possibleby merging the two entity types and the relationship into a single relation. This may beappropriate when both participations are total.

(3) Cross-reference or relationship relation option: The third alternative is to set up athird relation R for the purpose of cross-referencing the primary keys of the two relationsS and T representing the entity types.

Page 26: UNIT-2 DBMS

ER-CSEB 26

Step 4: Mapping of Binary 1:N Relationship Types.

– For each regular binary 1:N relationship type R, identify the relation S thatrepresent the participating entity type at the N-side of the relationshiptype.

– Include as foreign key in S the primary key of the relation T thatrepresents the other entity type participating in R.

– Include any simple attributes of the 1:N relation type as attributes of S.

Example: 1:N relationship types WORKS_FOR, CONTROLS, and SUPERVISION inthe figure. For WORKS_FOR we include the primary key DNUMBER of theDEPARTMENT relation as foreign key in the EMPLOYEE relation and call it DNO.

Step 5: Mapping of Binary M:N Relationship Types.

– For each regular binary M:N relationship type R, create a new relation Sto represent R.

– Include as foreign key attributes in S the primary keys of the relations thatrepresent the participating entity types; their combination will form theprimary key of S.

– Also include any simple attributes of the M:N relationship type (or simplecomponents of composite attributes) as attributes of S.

Example: The M:N relationship type WORKS_ON from the ER diagram is mappedby creating a relation WORKS_ON in the relational database schema. The primary keysof the PROJECT and EMPLOYEE relations are included as foreign keys inWORKS_ON and renamed PNO and ESSN, respectively.

Attribute HOURS in WORKS_ON represents the HOURS attribute of the relationtype. The primary key of the WORKS_ON relation is the combination of the foreign keyattributes {ESSN, PNO}.

Step 6: Mapping of Multivalued attributes.

– For each multivalued attribute A, create a new relation R. This relation Rwill include an attribute corresponding to A, plus the primary key attributeK-as a foreign key in R-of the relation that represents the entity type ofrelationship type that has A as an attribute.

– The primary key of R is the combination of A and K. If the multivaluedattribute is composite, we include its simple components.

Example: The relation DEPT_LOCATIONS is created. The attribute DLOCATIONrepresents the multivalued attribute LOCATIONS of DEPARTMENT, whileDNUMBER-as foreign key-represents the primary key of the DEPARTMENT relation.The primary key of R is the combination of {DNUMBER, DLOCATION}.

Page 27: UNIT-2 DBMS

ER-CSEB 27

Step 7: Mapping of N-ary Relationship Types.

– For each n-ary relationship type R, where n>2, create a new relationship Sto represent R.

– Include as foreign key attributes in S the primary keys of the relations thatrepresent the participating entity types.

– Also include any simple attributes of the n-ary relationship type (or simplecomponents of composite attributes) as attributes of S.

Example: The relationship type SUPPY in the ER below. This can be mapped to therelation SUPPLY shown in the relational schema, whose primary key is the combinationof the three foreign keys {SNAME, PARTNO, PROJNAME}

Page 28: UNIT-2 DBMS

ER-CSEB 28

SUMMARY -: Correspondence between ER and Relational Models

ER Model Relational ModelEntity type “Entity” relation1:1 or 1:N relationship type Foreign key (or “relationship” relation)M:N relationship type “Relationship”relation and two foreign keysn-ary relationship type “Relationship” relation and n foreign keysSimple attribute AttributeComposite attribute Set of simple component attributesMultivalued attribute Relation and foreign keyValue set DomainKey attribute Primary (or secondary) key