V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 1 Cleveland State University CIS611 – Relational Database Systems Lecture Notes Prof. Victor Matos RELATIONAL ALGEBRA
V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 1
Cleveland State University
CIS611 – Relational Database Systems
Lecture Notes
Prof. Victor Matos
RELATIONAL ALGEBRA
V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 2
THE RELATIONAL DATA MODEL (RM) and the Relational Algebra
The relational model of data (RM) was introduced by Dr. E. Codd (CACM, June 1970).
The RM is simpler and more uniform than the preceding Network and Hierarchical model.
S.Todd, (IBM 1976) presented PRTV the first implementation of a relational algebra DBMS.
A. Klug added summary functions for statistical computing (ACM SIGMOD 1982).
Roth, Oszoyoglu et al. (1987) extended the model to allow nested data structures
Clifford, Tansel, Navathe, and others have added time especifications into the model
Current research is aimed at extending the model to support complex data objects, multimedia mgnt, hyperdata, geographical, temporal and logical processing.
V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 3
THE RELATIONAL DATA MODEL (RM) and the Relational Algebra
A relational database is a collection of relations
A relation is a 2-dimensional table, in which each row represents a collection of related data values
The values in a relation can be interpreted as a fact describing an entity or a relationship
Relation name Attributes
STUDENT Name SSN Address GPA
Mary Poppins 111-22-3333 77 Picadilly St 4.00
Pepe Gonzalez 123-45-6789 123 Bonita Rd. 3.09 Tuples V. Sundarabatharan 999-88-7777 105 Calcara Ave. 3.87
Shi-Wua Yan 881-99-0101 778 Tienamen Sq. 3.88
V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 4
THE RELATIONAL DATA MODEL (RM) and the Relational Algebra
Domains, Tuples, Attributes, and Relations
A domain D is a set of atomic values. A domain is given a name, data type and format.
A relation schema R, denoted R(A1, A2, ...An), is a
set of attributes (column names). The degree of a relation is the number of attributes of its relation scheme
Each attribute Ai is the name of a role played by
some domain D in R(A1...An). D is denoted the
domain of Ai and is denoted by dom(Ai).
A relation r defined on schema R(A1...An), also
denoted by r(R), is a set of n-tuples r= { t1, t2, ...tm}
Each n-tuple t is an ordered list of n values t= <v1, v2,...,vn>, where each value vi is an element
of dom(Ai) or a special null value.
A relation r(R) is a subset of the cartesian product of the domains dom(Ai ) that define R.
V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 5
THE RELATIONAL DATA MODEL (RM) and the Relational Algebra
Characteristics of Relations
Relations are defined as a (mathematical) set of tuples.
Duplicate tuples are not allowed
Order of tuples inside a relation is immaterial
Ordering of values within a tuple is irrelevant; therefore the column ordering is not important.
Each value in a tuple is atomic (not divisible)
Recent research is oriented toward removing the atomicity of First Normal Form databases
V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 6
THE RELATIONAL DATA MODEL (RM) and the Relational Algebra
Key Attributes of a Relation
A superkey SK of relation r(R) is a group of attributes which uniquely identifies all the other attributes of r(R).
A key K of a relation schema R is a minimal superkey of R.
A relation schema R may have more than one key. Each of those is called a Candidate Key.
It is common to select one of the candidate keys and elevate it to Primary Key.
Convention: The attributes representing the primary key of schema R are underlined. Example: EMPLOYEE (SSN, Name, Address, Salary) STOCK (PartNum, SupNum, Quantity)
V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 7
THE RELATIONAL DATA MODEL (RM) and the Relational Algebra
Integrity Constrains
Integrity constrains are rules specified on the database
and are expected to hold on every instance of that schema.
1. Key constraints specify the candidate keys of each relation scheme R.
2. Entity integrity constraints state that no primary key value can be null.
3. Referential integrity constraints are specified
between two tables and is used to maintain the consistency among tuples of the two relations. Foreign key(s) of one relation are used to refer to primary key values in the other relation.
V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 8
RELATIONAL QUERY LANGUAGES Classification based on the underlying language „model‟
Model Example
1. Pure Algebraic ISBL-IBM
Info. Syst. Base Lang.
2. Pure Predicate Calculus
Tuple Oriented Type
Domain Oriented
QUEL Ingres
QBE, STBE
3. Mixed Algebra-Calculus
SQL
4. Object Oriented
DB4O
5. Associative
Sentences - LazySoft
6. Other
Cache (Object-Relational)
….
V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 9
THE RELATIONAL DATA MODEL (RM) and the Relational Algebra
Relational Algebra
Collection of operators which are used to manipulate
entire relations.
The result of each operation is a new relation.
Consists of two grups: operations on sets and operations
specifaclly designed to manipulate relational databases
Operations on sets: UNION
DIFFERENCE
INTERSECTION
CARTESIAN PRODUCT
Operations on databases SELECT
PROJECT
JOIN
AGGREGATE
DIVISION
RENAME
V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 10
RELATIONAL ALGEBRA
UNION
The result of this operation, denoted (r s ) or ( r + s ), is a
relation that includes all tuples that either are in r or s or both
in r and s.
Duplicate tuples are eliminated.
Combined relations must be union-compatible
r + s = { t / t r or t s }
r A B C s A B C
1 1 1 1 2 3
2 2 2 1 1 1
3 3 3 3 2 1
r + s A B C
1 1 1
2 2 2
3 3 3
1 2 3
3 2 1
V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 11
RELATIONAL ALGEBRA
DIFFERENCE
The result of this operation, denoted ( r - s ), is a relation
that includes all tuples that are in r but not in s.
Participating relations must be union-compatible
r - s = { t / t r and t s }
Example
r A B C s A B C
1 1 1 1 2 3
2 2 2 1 1 1
3 3 3 3 2 1
r - s A B C
2 2 2
3 3 3
NOTE: The difference operator is not commutative,
that is ( in general ) r - s s - r
V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 12
RELATIONAL ALGEBRA
INTERSECTION
The result of this operation, denoted ( r s ), is a relation
that includes all tuples that are present in both r and s.
Participating relations must be union-compatible
r s = { t / t r and t s }
Example
r A B C s A B C
1 1 1 1 2 3
2 2 2 1 1 1
3 3 3 3 2 1
r s A B C
1 1 1
V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 13
RELATIONAL ALGEBRA
CARTESIAN PRODUCT
The operation, denoted ( r s ), is also known as the cross
product or cross join. The purpose of the operator is to
concatenate rows from two relations, making all possible
combinations of rows.
Consider relation schemas r(A1,A2,...An ) and s(B1,B2,...Bm )
Relations r and s, do not have to be union-compatible
If r has n tuples, and s has m tuples, then (r s) will have
a total of (n * m) tuples
The resulting relation schema is ( A1,A2,...An, B1,...,Bm )
Example
r2 A B C r2 x s2
A B C D E
1 1 1 1 1 1 10 a
2 2 2 1 1 1 20 b
3 3 3 2 2 2 10 a
2 2 2 20 b
s2 D E 3 3 3 10 a
10 a 3 3 3 20 b
20 b
V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 14
RELATIONAL ALGEBRA
PROJECTION
The project operator extracts certain columns from the table
and discards the other columns.
Syntax: Result= Col
Table( )
where
Col is the list of columns to be extracted from the Table
Duplicate tuples in the resulting table are eliminated
EXAMPLE
Evaluate the expressions Temp1= A
r( )
Temp2= B C
r,
( )
r A B C Temp1 A Temp2 B C
1 610 3 1 610 3
1 620 3 2 620 3
1 600 2 600 2
1 650 2 650 2
2 610 3 634 4
2 634 4
V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 15
RELATIONAL ALGEBRA
SELECTION
The selection operator extracts certain rows from the table and discards
the others. Retrieved tuples must satisfy a given filtering condition.
Syntax: Result = cond (table)
where
Cond is a logical expression containing and, or, not operators on
clauses of the form (table.column value) or (table.col1 table.col2)
and = { =, >, >=, <, <=, <> }
Entire rows (with all of their columns) are retrieved when the
condition is met.
EXAMPLE Evaluate the expression
Temp1 = (B >=620) and (c<4) (r)
r A B C Temp1
A B C
1 610 3 1 620 3
1 620 3 1 650 2
1 600 2
1 650 2
2 610 3
2 634 4
V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 16
RELATIONAL ALGEBRA
RENAME
In some cases, we may want to rename the attributes of a relation or the
relation name or both.
The rename operator is useful to avoid situations in which a query
produces columns with the same name (perhaps different meaning).
Syntax: Result = oldName ← newName (table)
where
oldName is a column in the table and newName is the new
identification for the column.
Only the column name is changed, data remains intact.
EXAMPLE Evaluate the expression
Temp1 = A ← Section, B ← Course, B ← Credits (r)
r A B C Temp1
Section
Course
Credits
1 610 3 1 610 3
1 620 3 1 620 3
1 600 2 1 600 2
1 650 2 1 650 2
2 610 3 2 610 3
2 634 4 2 634 4
V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 17
RELATIONAL ALGEBRA
JOIN
The join operation, denoted by (Tab1 < join condition>
Tab2), is
used to combine related tuples from two relations
Join-condition format is: (Table1.Col1 Table2.Col2),
where could be { =, >, >=, <, <=, <> }
Restrictions of the form (Table.Col Value), can be
and-ed, or or-ed to the joining condition.
EXAMPLE Consider relation schemas r(A,B,C ) and s(D,E ), and the
expression: Temp1= ( r ( )C D
s )
r A B C
1 1 1
2 2 2
3 3 3 Temp1 A B C D E
1 1 1 1 a
s D E 2 2 2 2 b
1 a 2 2 2 2 c
2 b
2 c
V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 18
RELATIONAL ALGEBRA
NATURAL JOIN
The natural join operation, denoted by (Tab1 * Tab2 ), is
used to combine tuples of two relations under an equi-join.
Related columns must have the same name & domain
Implicit Join-Conditions are: (Table1.Col1 = Table2.Col2)
EXAMPLE Consider relation schemas r(A,B,C ) and s(B,C,D ), and the
expression: Temp= ( r * s )
r A B C
1 1 1
2 1 0
4 3 2 Temp A B C D
1 1 1 a
s B C D 4 3 2 c
1 1 a
1 2 b
3 2 c
4 3 d
The implicit join-condition is (r.B=s.B) and (r.C=s.C)
V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 19
RELATIONAL ALGEBRA
LEFT OUTER JOIN
The left outer join operation, denoted by (r< join condition>
s ), is a special
case of the general join.
LOJ keeps in the resulting table representation from every tuple that
appears in the first (or left) relation
If no matching value for r is found in s, then the attributes of s appear
in the result as null values
EXAMPLE Consider relation schemas r(A,B,C ) and s(D,E ), and the expression:
Temp1= ( r ( )A D s )
r A B C
1 1 1
2 2 2
3 3 3 Temp1 A B C D E
1 1 1 1 a
s D E 2 2 2 2 b
1 a 2 2 2 2 c
2 b 3 3 3 null null
2 c
NOTE:
Outer-join is not a primitive operator. It could be expressed as follows:
( r ( )A D s ) = ( ( ( ) ( )) )L
A DY Y
r s r s Null
Where: Y= Schema(r) Schema(s), and L = degree(s) - |Y|
V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 20
RELATIONAL ALGEBRA
AGGREGATE FUNCTIONS
Originally proposed by A. Klug (1982) to extend the scope of
relational algebra allowing mathematical computations of
summary functions.
Syntax: <grouping attributes> <function list> ( <relation name> )
Common functions are: MAX, MIN, AVG, SUM, COUNT
Grouping attributes force a fragmentation of the relation,
the function is computed in each independent group.
Output consists of the grouping attributes and the result of
the summary functions on each group
If no grouping field(s) is given the function(s) applies on
the entire table
EXAMPLE
Compute Temp= A Sum(B), Max(C) ( r )
r A B C
1 10 1 Group-by field
Summary Data
1 2 5 Temp A Sum_B Max_C
2 3 3 1 12 5
3 6 10 2 3 3
3 5 7 3 11 10
V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 21
RELATIONAL ALGEBRA
DIVISION
The division operation, denoted by (r / s) is useful when you need a
mechanism to identify the tuples of some table that are related to each
and every one of the tuples of a second group.
EXAMPLE Consider relation schemas r(A,B ) and s(B ), and the expression:
Temp1= ( r / s )
r A B Temp1
A
1 1 1
1 2
1 3
1 4
2 1
2 3
3 3
s B
1
2
3
NOTE
Division is not a primitive operation it could be expressed as:
r / s = A
for table schemes r(A,B) and s(B)
the algebraic expression
r[A,B] / s[B]
selects the A-values from the dividend table
r[A,B], whose B-values are a super-set of those
B-values held in the divisor table s[B].
V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 22
RELATIONAL ALGEBRA
PACK
Assume A is an attribute in Schema(r). The PackA(r) operator transforms
the A-values into a ‘nested’ representation.
A Nested field is a set of related atomic values.
EXAMPLE Consider relation schemas r(A,B,C) and the expression:
temp1 = PackC(r)
r A B C
b1 40 a1
b1 40 a2
b2 50 a3
b3 60 a4
b3 60 a2
b3 60 a5
b4 60 a6
Packc(r)
temp1 A B C
b1 40 {a1, a2}
b2 50 {a3}
b3 60 {a2, a4, a5}
b4 60 {a6}
Consider relation schemas s(A,B,C) and the expression:
temp2 = PackC(s)
s A B C
m1 1 {a1, a2}
m1 1 {a3}
m2 2 { a4 }
m2 1 {a5, a6}
m2 2 {a4, a7, a8}
PackC(s)
temp2 A B C
m1 1 {a1, a2, a3}
m2 2 {a4, a7, a8}
m2 1 {a5, a6}
V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 23
RELATIONAL ALGEBRA
PACK (cont…)
Let A be one of the n attributes in Schema (A1…An). Assume the
relation r is defined over the Schema (A1…An).
Let CA = Schema (A1…An) – {A}. Therefore |CA| = n-1
For each (n-1)-tuple ( )AC
g r we define the sets Wg[CA] and Wg[A] as
follows:
Wg[A] = { t[A] / t and t[CA]= g } if A is atomic, and
Wg[A] = { x / t) t otherwise.
Then PackA(r) = { Wg / ( )AC
g r }
Therefore, the Pack operator converts sets of r-tuples whose (n-1)
attributes for CA are the same into a single tuple.
V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 24
RELATIONAL ALGEBRA
UNPACK
Unpack is the counterpart of the Pack operator. When applied on the set-
valued attribute A of a relation r, this operator transforms the single non-
atomic version of the tuple into a group of records in which the attribute
A is atomic.
EXAMPLE Consider relation schemas r(A,B,C) and the expression:
temp1 = UnpackC(r)
r A B C
b1 1 {a1, a2}
b2 2 {a3}
b2 2 {a2, a4, a5}
b4 3 {a6}
UnpackC(r)
temp1 A B C
b1 1 a1
b1 1 a2
b2 2 a3
b3 2 a4
b3 2 a2
b3 2 a5
b4 3 a6
Let A be one of the n attributes in Schema (A1…An). Assume the
relation r is defined over the Schema (A1…An).
Let CA = Schema (A1…An) – {A}. Therefore |CA| = n-1
UPA ( {t} ) = { t[A] } if A is atomic, and
UPA( {t} ) = { t’ / (t’[A] t[A]) and (t’[CA] = t[CA]) } otherwise.
Then PackA(r) = { Wg / ( ({ })A
t r
UP t }
If A is atomic then UPA(r)= r, otherwise UPA(r) maps each tuple t in r
into a set of (decompressed) tuples such that each element in t[A]
becomes the atomic A-value of a new decompressed tuple.
V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 25
RELATIONAL ALGEBRA
EXAMPLE QUERIES
QUERY 1. Retrieve the name and address of all employees
who work in the 'Research' department.
QUERY 2. For every project located in 'Cleveland', list the
project number, the controlling department number, and the
department manager's last name, address, and birthdate.
QUERY 3. Find the name of employees who work on all
projects controlled by department number 5.
QUERY 4. Make a list of project numbers for projects that
involve an employee whose last name is 'Smith', either as a
worker or as a manager of the department that controls the
project.
QUERY 5. List the name of all employees with two or
more dependents.
QUERY 6. Retrieve the name of employees who have no
dependents. QUERY 7. List the name of managers who have at least
one dependent.
V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 26
Company Database
DEPARTMENTDNAME
DNUMBER
MGRSSN
MGRSTARTDATE
DEPENDENTESSN
DEPENDENT_NAME
SEX
BDATE
RELATIONSHIP
DEPT_LOCATIONSDNUMBER
DLOCATION
EMPLOYEEFNAME
MINIT
LNAME
SSN
BDATE
ADDRESS
SEX
SALARY
SUPERSSN
DNO
PROJECTPNAME
PNUMBER
PLOCATION
DNUM
WORKS_ONESSN
PNO
Hours
V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 28
IST 331 Brief Notes on Relational Algebra V. Matos. Consider the relation schema of the COMPANY database given below
EMPLOYEE (fmane, minit, lname, ssn, birthdate, address, sex, salary, superssn, dno)
DEPARTMENT (dname, dnumber, mgrssn, mgrstartdate)
PROJECT (pname, pnumber, plocation, dnum)
WORKS_ON (essn, pno, hours)
DEPENDENT (essn, dependent-name, sex, bdate, relationship)
Operator Example Comments
Selection ( ' ') ( 25000)
( )sex F and salary
Answer Employee Find the female employees
earning at least $25K.
Projection ,
( )ssn Fname
Answer Employee Get the Social Sec. No. and first name of each employee
Join ( . . )L dno R dnumber
Answer Employee Department Merge employee and department records according to matching dept. numbers.
Union ( ' ') ( 25000) ( ' ') ( 35000)
( ) ( )sex F and salary sex M and salary
Answer Employee Employee Get the male employees earning at least $35K and the female employees whose salary is exactly $25K
Minus 4( )
dnoAnswer Employee Employee Get the employees who do not
work for department No. 4
Intersection ' ' 4
( ) ( )sex F dno
Answer Employee Employee Get all the female employees who work for dept. 4
Aggregation , ( )( )dno sex average salaryAnswer EmployeeF
Find the average salary of employees grouping by sex and dept. no. Put the results in the table defined as: Answer(dno,sex,average_salary)
Division ' ',
( ) ( ( ))plocation Clev
essn pno Pnumber
Answer WorksOn Project Get the SSN of employees working in each one of the projects located in Cleveland
Rename , ,
( )fname lname First Last
Answer Employee Change the labels “fname” and “lname” in the Employee table to “First”, and “Last”.
Q01. Get the SSN and Last name of each of the female managers
( . . )' ',
1 ( ( ) )L ssn R MgrSsnsex Fssn lname
Answer Employee Department 1
Q02. Get the Social Sec. No. of those employees who are married.
( ' ')
2 ( ( ) )relationship Spouseessn
Answer Dependent
1 Observation: The notation
( . . )( 1 2)
L a R be e merges the tables produced by the expressions e1 and e2. The match is dictated by the joining
condition (L.a=R..b). The fragment L.a identifies the a-column as part of the table produced by the table/expression e1.The L and R
qualifications indicate whether the source columns are located to the left or right side of the join-operator .
V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 29
Q03. Get the last name of the married female managers. Rename the column to “Mgr Name”
( . . )" "
3 ( ( 1 2))L ssn R essnlname Mgr Name Lname
Answer Answer Answer
Q04. Get the last name of married employees who have at least one daughter and one son.
( ' ')
( ( ))relationship Sonessn
Boys Dependent
( ' ')
( ( ) )relationship Daughteressn
Girls Dependent
( ' ')
( ( ) )relationship Spouseessn
Married Dependent
. .( )
L ssn R essnLname
TheSsn Married Boys Girls
Answer Employee TheSsn
Q05. Get the last name of married employees who have no children.
. .
( )
( )L ssn R essn
Lname
TheSsn Married Boys Girls
Answer Employee TheSsn
Q06. Get the last name of married employees who only have daughters.
. .
( )
( )L ssn R essn
Lname
TheSsn Married Girls Boys
Answer Employee TheSsn
Q07. Get the last name and salary of each employee as well as that of their corresponding (direct) supervisor.
( . . ), , , ,
( ( ) ( ) )L superSsn R BossSsnEmpSsn ssn salary EmpSsn EmpSalary ssn salary BossSsn BossSalary
EmpSalaryBossSsnBossSalary
Answer Employee Employee
Q08. Get the last name of employees who work on five or more projects.
( )
. . ( _ 5)
( )
( ( ) )
essn count pno
L ssn R essnLname count pno
theTally worksOn
Answer Employee theTally
F
V. Matos - CIS611_LECTURE_NOTES_ALGEBRA.docx 30
Relational Algebra – Practice Test
Last Name: ___________________________ First Name:__________________
Consider the relation schema of the COMPANY database given below EMPLOYEE (fmane, minit, lname, ssn, birthdate, address, sex, salary, superssn, dno) KEY: ssn DEPARTMENT (dname, dnumber, mgrssn, mgrstartdate) KEY: dnumber. PROJECT (pname, pnumber, plocation, dnum) KEY: pnumber. WORKS_ON (essn, pno, hours) KEY: (essn, pno) DEPENDENT (essn, dependent-name, sex, bdate, relationship) KEY: (essn, dependent-name)
Formulate the following question in Relational Algebra query language:
1. Give the last name of those employees who work in any project(s) where there are more female than male
employees.
2. Give the last name of those female managers who work in each of the projects located in Cleveland.