Relational Algebra and SQL Prof. Sin-Min Lee Department of Computer Science
Relational Algebra and SQL
Prof. Sin-Min Lee
Department of Computer Science
The Role of Relational Algebra in a DBMS
Relational Algebra Operations
More Relational Algebra
Select
• Extracts specified tuples (rows) from a specified relation (table).
Select Operator
• Produce table containing subset of rows of argument table satisfying condition
condition (relation)
• Example:
Person Person Hobby=‘stamps’(PersonPerson)
1123 John 123 Main stamps1123 John 123 Main coins5556 Mary 7 Lake Dr hiking9876 Bart 5 Pine St stamps
1123 John 123 Main stamps9876 Bart 5 Pine St stamps
Id Name Address HobbyId Name Address Hobby
Selection Condition
• Operators: <, , , >, =, • Simple selection condition:
– <attribute> operator <constant>– <attribute> operator <attribute>
• <condition> AND <condition>
• <condition> OR <condition>• NOT <condition>
Selection Condition - Examples
Id>3000 OR Hobby=‘hiking’ (PersonPerson)
Id>3000 AND Id <3999 (PersonPerson)
NOT(Hobby=‘hiking’) (PersonPerson)
Hobby‘hiking’ (PersonPerson)
Project
• Extracts specified attributes(columns) from a specified relation.
Project Operator• Produces table containing subset of columns of
argument table attribute list(relation)
• Example: PersonPerson Name,Hobby(PersonPerson)
1123 John 123 Main stamps1123 John 123 Main coins5556 Mary 7 Lake Dr hiking9876 Bart 5 Pine St stamps
John stampsJohn coinsMary hikingBart stamps
Id Name Address Hobby Name Hobby
Project Operator
1123 John 123 Main stamps1123 John 123 Main coins5556 Mary 7 Lake Dr hiking9876 Bart 5 Pine St stamps
John 123 MainMary 7 Lake DrBart 5 Pine St
Result is a table (no duplicates); can have fewer tuplesthan the original
Id Name Address Hobby Name Address
• Example: PersonPerson Name,Address(PersonPerson)
Expressions
1123 John 123 Main stamps1123 John 123 Main coins5556 Mary 7 Lake Dr hiking9876 Bart 5 Pine St stamps
1123 John9876 Bart
Id Name Address Hobby Id Name
PersonPerson
ResultResult
Id, Name ( Hobby=’stamps’ OR Hobby=’coins’ (PersonPerson) )
Set Operators
• Relation is a set of tuples, so set operations should apply: , , (set difference)
• Result of combining two relations with a set operator is a relation => all its elements must be tuples having same structure
• Hence, scope of set operations limited to union compatible relationsunion compatible relations
Union Compatible Relations
• Two relations are union compatibleunion compatible if– Both have same number of columns– Names of attributes are the same in both– Attributes with the same name in both relations
have the same domain
• Union compatible relations can be combined using unionunion, intersectionintersection, and setset differencedifference
ExampleTables: PersonPerson (SSN, Name, Address, Hobby) ProfessorProfessor (Id, Name, Office, Phone)are not union compatible. But Name (PersonPerson) and Name (ProfessorProfessor)
are union compatible so
Name (PersonPerson) - Name (ProfessorProfessor)
makes sense.
Cartesian Product• If RR and SS are two relations, RR SS is the set of all
concatenated tuples <x,y>, where x is a tuple in RR and y is a tuple in SS– RR and SS need not be union compatible
• RR SS is expensive to compute:– Factor of two in the size of each row– Quadratic in the number of rows
A B C D A B C D x1 x2 y1 y2 x1 x2 y1 y2 x3 x4 y3 y4 x1 x2 y3 y4 x3 x4 y1 y2 RR SS x3 x4 y3 y4 RR SS
Renaming• Result of expression evaluation is a relation• Attributes of relation must have distinct names.
This is not guaranteed with Cartesian product– e.g., suppose in previous example a and c have the
same name
• Renaming operator tidies this up. To assign the names A1, A2,… An to the attributes of the n column relation produced by expression expr use expr [A1, A2, … An]
Example
This is a relation with 4 attributes: StudId, CrsCode1, ProfId, CrsCode2
TranscriptTranscript (StudId, CrsCode, Semester, Grade)
TeachingTeaching (ProfId, CrsCode, Semester)
StudId, CrsCode (TranscriptTranscript)[StudId, CrsCode1]
ProfId, CrsCode(TeachingTeaching) [ProfId, CrsCode2]
Join• Builds a relation from two specified
relations consisting of all possible concatenated pairs, one from each of the two relations, such that in each pair the two tuples satisfy some condition. (E.g., equal values in a given col.)
A1 B1A2 B1A3 B2
B1 C1B2 C2B3 C3
A1 B1 C1A2 B1 C1A3 B2 C2
(Naturalor Inner)
Join
Outer Join
• Outer Joins are similar to PRODUCT -- but will leave NULLs for any row in the first table with no corresponding rows in the second.
A1 B1A2 B1A3 B2A4 B7
B1 C1B2 C2B3 C3
A1 B1 C1A2 B1 C1A3 B2 C2A4 * *
Outer Join
Join ItemsPart # Name Price Count1 Big blue widget 3.76 22 Small blue Widget 7.35 43 Tiny red widget 5.25 74 large red widget 157.23 235 double widget rack 10.44 126 Small green Widget 30.45 587 Big yellow widget 7.96 18 Tiny orange widget 81.75 429 Big purple widget 55.99 9
Invoice # Part # Quantity93774 3 1084747 23 188367 75 288647 4 3
776879 22 565689 76 1293774 23 1088367 34 2
Invoice # Cust # Rep #93774 3 184747 4 188367 5 288647 9 1
776879 2 265689 6 2
Cust # COMPANY STREET1 STREET2 CITY STATE ZIPCODE
1Integrated Standards Ltd. 35 Broadway Floor 12 New York NY 02111
2 MegaInt Inc. 34 Bureaucracy Plaza Floors 1-172 Phildelphia PA 03756
3 Cyber Associates3 Control Elevation Place
Cyber Assicates Center Cyberoid NY 08645
4General Consolidated 35 Libra Plaza Nashua NH 09242
5Consolidated MultiCorp 1 Broadway Middletown IN 32467
6Internet Behometh Ltd. 88 Oligopoly Place Sagrado TX 78798
7Consolidated Brands, Inc.
3 Independence Parkway Rivendell CA 93456
8 Little Mighty Micro 34 Last One Drive Orinda CA 94563
9 SportLine Ltd. 38 Champion Place Suite 882 Compton CA 95328
Derived Operation: Join
A (generalgeneral or thetatheta) join join of R and S is the expression R join-condition S
where join-condition is a conjunction of terms: Ai oper Bi
in which Ai is an attribute of R; Bi is an attribute of S; and oper is one of =, <, >, , .
The meaning is:
join-condition´ (R S)
where join-condition and join-condition´ are the same, except for possible renamings of attributes (next)
Join• Join is a derivative of Cartesian product.
• Equivalent to performing a Selection, using join predicate as selection formula, over Cartesian product of the two operand relations.
• One of the most difficult operations to implement efficiently in an RDBMS and one reason why RDBMSs have intrinsic performance problems.
• Various forms of join operation– Natural join (defined by Codd)
– Outer join
– Theta join
– Equijoin (a particular type of Theta join)
– Semijoin
Natural Join• List the names and comments of all clients who have viewed
a property for rent.
(clientNo, fName, lName(Client)) Join (clientNo, propertyNo,
comment(Viewing))
Or Client [clientNo, fName, lName] Join Viewing [clientNo, propertyNo,
comment ]
Outer Join• To display rows in the result that do not have matching values in the
join column, use Outer join.
• R Left Outer Join S– (Left) outer join is join in which tuples from R that do not have matching
values in common columns of S are also included in result relation.
Example:
• Produce a status report on property viewings.propertyNo, street, city(PropertyForRent)
Left Outer Join Viewing
Join and Renaming
• Problem: R and S might have attributes with the same name – in which case the Cartesian product is not defined
• Solutions: 1. Rename attributes prior to forming the product and
use new names in join-condition´.2. Qualify common attribute names with relation names
(thereby disambiguating the names). For instance: Transcript.Transcript.CrsCodeCrsCode or Teaching.Teaching.CrsCodeCrsCode
– This solution is nice, but doesn’t always work: consider
RR join_condition RR
In RR.A, how do we know which R is meant?
Theta Join – Example
Employee(Employee(Name,Id,MngrId,SalaryName,Id,MngrId,Salary) Manager(Manager(Name,Id,SalaryName,Id,Salary)
Output the names of all employees that earnmore than their managers.EmployeeEmployee.Name (EmployeeEmployee MngrId=Id AND Salary>Salary ManagerManager)
The join yields a table with attributes:EmployeeEmployee.Name, EmployeeEmployee.Id, EmployeeEmployee.Salary, MngrIdManagerManager.Name, ManagerManager.Id, ManagerManager.Salary
Equijoin Join - Example
Name,CrsCode(StudentStudent Id=StudId Grade=‘A’ (TranscriptTranscript))
Id Name Addr Status111 John ….. …..222 Mary ….. …..333 Bill ….. …..444 Joe ….. …..
StudId CrsCode Sem Grade 111 CSE305 S00 B 222 CSE306 S99 A 333 CSE304 F99 A
Mary CSE306Bill CSE304
The equijoin is used veryfrequently since it combinesrelated data in different relations.
StudentStudent TranscriptTranscript
EquijoinEquijoin: Join condition is a conjunction of equalities.
Natural Join• Special case of equijoin:
– join condition equates all and only those attributes with the same name (condition doesn’t have to be explicitly stated)
– duplicate columns eliminated from the result
TranscriptTranscript (StudId, CrsCode, Sem, Grade)Teaching (Teaching (ProfId, CrsCode, Sem)
TranscriptTranscript TeachingTeaching = StudId, Transcript.CrsCode, Transcript.Sem, Grade, ProfId
( TranscriptTranscript CrsCode=CrsCode AND Sem=Sem Sem Teaching Teaching ) [StudId, CrsCode, Sem, Grade, ProfId ]
Natural Join (cont’d)
• More generally:RR SS = attr-list (join-cond (RR × SS) )
where attr-list = attributes (RR) attributes (SS)(duplicates are eliminated) and join-cond has the form: A1 = A1 AND … AND An = An
where {A1 … An} = attributes(RR) attributes(SS)
Natural Join Example
• List all Ids of students who took at least two different courses:
StudId ( CrsCode CrsCode2 ( TranscriptTranscript
TranscriptTranscript [StudId, CrsCode2, Sem2, Grade2] )) We don’t want to join on CrsCode, Sem, and Grade attributes, hence renaming!
Aggregates
• Functions that operate on sets:– COUNT, SUM, AVG, MAX, MIN
• Produce numbers (not tables)
• Not part of relational algebra
SELECT COUNT(*)FROM ProfessorProfessor P
SELECT MAX (Salary)FROM EmployeeEmployee E
Aggregates
SELECT COUNT (T.CrsCode)FROM TeachingTeaching TWHERE T.Semester = ‘S2000’
SELECT COUNT (DISTINCT T.CrsCode)FROM TeachingTeaching TWHERE T.Semester = ‘S2000’
Count the number of courses taught in S2000
But if multiple sections of same course are taught, use:
Aggregates: Proper and Improper Usage
SELECT COUNT (T.CrsCode), T. ProfId – makes no sense (in the absence of GROUP BY clause)
SELECT COUNT (*), AVG (T.Grade) – but this is OK
WHERE T.Grade > COUNT (SELECT ….) – aggregate cannot be applied to result of SELECT statement
Grouping• But how do we compute the number of courses
taught in S2000 per professor?– Strategy 1: Fire off a separate query for each
professor:SELECT COUNT(T.CrsCode)FROM TeachingTeaching TWHERE T.Semester = ‘S2000’ AND T.ProfId =
123456789• Cumbersome• What if the number of professors changes? Add another query?
– Strategy 2: define a special grouping operatorgrouping operator:SELECT T.ProfId, COUNT(T.CrsCode)FROM TeachingTeaching TWHERE T.Semester = ‘S2000’GROUP BY T.ProfId
GROUP BY
GROUP BY - Example
SELECT T.StudId, AVG(T.Grade), COUNT (*)FROM TranscriptTranscript TGROUP BY T.StudId
TranscriptTranscript
Attributes: -student’s Id -avg grade -number of courses
1234 3.3 41234123412341234
HAVING Clause• Eliminates unwanted groups (analogous to WHERE clause)• HAVING condition constructed from attributes of GROUP
BY list and aggregates of attributes not in list
SELECT T.StudId, AVG(T.Grade) AS CumGpa, COUNT (*) AS NumCrsFROM TranscriptTranscript TWHERE T.CrsCode LIKE ‘CS%’GROUP BY T.StudIdHAVING AVG (T.Grade) > 3.5
Evaluation of GroupBy with Having
Example
• Output the name and address of all seniors on the Dean’s List
SELECT S.Id, S.NameFROM StudentStudent S, TranscriptTranscript TWHERE S.Id = T.StudId AND S.Status = ‘senior’
GROUP BY
HAVING AVG (T.Grade) > 3.5 AND SUM (T.Credit) > 90
S.Id -- wrongS.Id, S.Name -- right
Every attribute that occurs in SELECT clause must also occur in GROUP BY or it
must be an aggregate. S.Name does not.
ORDER BY Clause
• Causes rows to be output in a specified order
SELECT T.StudId, COUNT (*) AS NumCrs, AVG(T.Grade) AS CumGpaFROM TranscriptTranscript TWHERE T.CrsCode LIKE ‘CS%’GROUP BY T.StudIdHAVING AVG (T.Grade) > 3.5ORDER BY DESC CumGpa, ASC StudId
Query Evaluation Strategy1 Evaluate FROM: produces Cartesian product, A, of tables in
FROM list2 Evaluate WHERE: produces table, B, consisting of rows of A
that satisfy WHERE condition3 Evaluate GROUP BY: partitions B into groups that agree on
attribute values in GROUP BY list4 Evaluate HAVING: eliminates groups in B that do not satisfy
HAVING condition5 Evaluate SELECT: produces table C containing a row for each
group. Attributes in SELECT list limited to those in GROUP BY list and aggregates over group
6 Evaluate ORDER BY: orders rows of C
Nested QueriesList all courses that were not taught in S2000
SELECT C.CrsNameFROM Course CWHERE C.CrsCode NOT IN (SELECT T.CrsCode --subquery FROM Teaching T WHERE T.Sem = ‘S2000’)
Evaluation strategy: subquery evaluated once toproduces set of courses taught in S2000. Each row(as C) tested against this set.
Correlated Nested Queries Output a row <prof, dept> if prof has taught a course in dept.
SELECT T.ProfId --subqueryFROM Teaching T, Course CWHERE T.CrsCode=C.CrsCode AND C.DeptId=D.DeptId --correlation
SELECT P.Name, D.Name --outer query FROM Professor P, Department D WHERE P.Id IN (set of Id’s of all profs who have taught a course in D.DeptId)
Correlated Nested Queries (con’t)
• Tuple variables T and C are local to subquery• Tuple variables P and D are global to subquery• Correlation: subquery uses a global variable, D• The value of D.DeptId parameterizes an
evaluation of the subquery• Subquery must (at least) be re-evaluated for
each distinct value of D.DeptId• Correlated queries can be expensive to evaluate
Division
• Goal: Produce the tuples in one relation, r, that match all tuples in another relation, s– rr (A1, …An, B1, …Bm)– ss (B1 …Bm)– rr/ss, with attributes A1, …An, is the set of all
tuples <a> such that for every tuple <b> in ss, <a,b> is in rr
• Can be expressed in terms of projection, set difference, and cross-product
Division (con’t)
Division
• Identify all clients who have viewed all properties with three rooms.
(clientNo, propertyNo(Viewing)) (propertyNo(rooms = 3 (PropertyForRent)))
Division - Example• List the Ids of students who have passed all
courses that were taught in spring 2000• Numerator:
– StudId and CrsCode for every course passed by every student:
StudId, CrsCode (Grade ‘F’ (TranscriptTranscript) )
• Denominator:– CrsCode of all courses taught in spring 2000
CrsCode (Semester=‘S2000’ (TeachingTeaching) )
• Result is numerator/denominator
Division• Query type: Find the subset of items in one set that
are related to all items in another set• Example: Find professors who have taught courses
in all departments– Why does this involve division?
ProfId DeptId DeptId
All department IdsContains row<p,d> if professorp has taught acourse in department d
ProfId,DeptId(Professor) / DeptId(Department)
Division
• Strategy for implementing division in SQL: – Find set, A, of all departments in which a
particular professor, p, has taught a course
– Find set, B, of all departments
– Output p if A B, or, equivalently, if B–A is empty
Division – SQL Solution
SELECT P.IdFROM ProfessorProfessor PWHERE NOT EXISTS (SELECT D.DeptId -- set B of all dept Ids FROM DepartmentDepartment D EXCEPT SELECT C.DeptId -- set A of dept Ids of depts in -- which P has taught a course FROM TeachingTeaching T, CourseCourse C WHERE T.ProfId=P.Id -- global variable AND T.CrsCode=C.CrsCode)
Division
• Query type: Find the subset of items in one set that are related to all items in another set
• Example: Find professors who have taught courses in all departments– Why does this involve division?
ProfId DeptId DeptIdAll department IdsContains row
<p,d> if professorp has taught acourse in department d
Division
• Strategy for implementing division in SQL: – Find set of all departments in which a
particular professor, p, has taught a course - A
– Find set of all departments - B
– Output p if A B, or equivalently if B-A is empty
Division – SQL Solution
SELECT P.IdFROM Professor PWHERE NOT EXISTS (SELECT D.DeptId -- B: set of all dept Ids FROM Department D EXCEPT SELECT C.DeptId -- A: set of dept Ids of depts in -- which P has taught a course FROM Teaching T, Course C WHERE T.ProfId=P.Id --global variable AND T.CrsCode=C.CrsCode)
GROUP BY Table output by WHERE clause: - Divide rows into groups based on subset of attributes; - All members of a group agree on those attributes
Each group can be described by a singlerow in a table with attributes limited to: -Attributes all group members share (listed in GROUP BY clause) -Aggregates over group
grou
p
GROUPBYattributes
aabbcccccddd
GROUP BY - Example
SELECT T.StudId, AVG(T.Grade), COUNT (*)FROM Transcript TGROUP BY T.StudId
Transcript
Attributes: -student’s Id -avg grade -number of courses
1234 3.3 41234123412341234
HAVING Clause• Eliminates unwanted groups (analogous to WHERE clause)• HAVING condition constructed from attributes of GROUP
BY list and aggregates of attributes not in list
SELECT T.StudId, AVG(T.Grade) AS CumGpa, COUNT (*) AS NumCrsFROM Transcript TWHERE T.CrsCode LIKE ‘CS%’GROUP BY T.StudIdHAVING AVG (T.Grade) > 3.5
Example
• Output the name and address of all seniors on the Dean’s List
SELECT S.Name, S.AddressFROM Student S, Transcript TWHERE S.StudId = T.StudId AND S.Status = ‘senior’
GROUP BY
HAVING AVG (T.Grade) > 3.5 AND SUM (T.Credit) > 90
S.StudId -- wrongS.Name, S.Address -- right
SQL and Relational Algebra• RELATIONAL ALGEBRA SQL • PSEUDO-CODE• • 1) R = customers where city = 'Dallas' SELECT * from customers where city =
'Dallas';• • For c1 = first row of customers until c1 = last row of customers• If c1.city = ‘Dallas’• Display c1.*• End-If• End-For• • • 2) R = customers [cid, cname] SELECT cid, cname from customers;• • For c1 = first row of customers until c1 = last row of customers• Display c1.cid, c1.cname• End-For
Constructing SQL
http://www.cs.ru.nl/~gerp/IS0/sheets/IS0_Relationele_Algebra_SQL2.pdf