CHAPTERS 3-6 RELATIONAL DATA MODELS, RELATIONAL CONSTRAINTS, AND RELATIONAL ALGEBRA C h a p t e r s 5 - 8 1 Flat file: A two dimensional array of attributes or data items ProductX 1 Bellaire 5 ProductY 2 Sugarland 5 ProductZ 3 Houston 5 Computerization 10 Stafford 4 Reorganization 20 Houston 1 Newbenefits 30 Stafford 4 Database Management Systems (DBMS): A generalized software system that is used to create, manage, and protect data bases
88
Embed
Chapters 3-6 Relational Data Models, Relational Constraints, and Relational Algebra
Chapters 3-6 Relational Data Models, Relational Constraints, and Relational Algebra. Flat file: A two dimensional array of attributes or data items ProductX 1 Bellaire 5 ProductY 2 Sugarland 5 - PowerPoint PPT Presentation
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
CHAPTERS 3-6RELATIONAL DATA MODELS, RELATIONAL CONSTRAINTS, AND RELATIONAL ALGEBRA
Chapters 5-8
1
Flat file: A two dimensional array of attributes or data items
Database Management Systems (DBMS): A generalized software system that is used to create, manage, and protect data bases
Chapters 5-8
2
Chapters 5-8
3
Attribute: A name characteristic or property of an entity
= column header
Entity: A “thing” in the real world with an independent
existence physical existence: person, student, car
Domain - The valid set of atomic value for an attribute in a relation
e.g. SSN set of 9 digits GPA: 0<= GPA <= 4.0
Atomic - each value in the domain is indivisible
Name (Fname, Minit, Lname) – not atomic
Fname -- atomic Minit -- atomic Lname -- atomic
4
Chapters 5-8
RELATIONAL MODEL CONCEPTS
A Relation is a mathematical concept based on the ideas of sets
The model was first proposed by Dr. E.F. Codd of IBM Research in 1970 in the following paper:"A Relational Model for Large Shared Data
Banks," Communications of the ACM, June 1970
The above paper caused a major revolution in the field of database management and earned Dr. Codd the coveted ACM Turing Award 5
Chapters 5-8
INFORMAL DEFINITIONS
Informally, a relation looks like a table of values.
A relation typically contains a set of rows.
The data elements in each row represent certain facts that correspond to a real-world entity or relationship In the formal model, rows are called tuples
Each column has a column header that gives an indication of the meaning of the data items in that column In the formal model, the column header is called
an attribute name (or just attribute) 6
Chapters 5-8
FORMAL DEFINITIONS - SCHEMA
The Schema (or description) of a Relation: Denoted by R(A1, A2, .....An) R is the name of the relation The attributes of the relation are A1, A2, ..., An
Example:
CUSTOMER (Cust-id, Cust-name, Address, Phone#) CUSTOMER is the relation name Defined over the four attributes: Cust-id, Cust-
name, Address, Phone# Each attribute has a domain or a set of valid
values. For example, the domain of Cust-id is 6 digit
numbers.7
Chapters 5-8
FORMAL DEFINITIONS - TUPLE
A tuple is an ordered set of values (enclosed in angled brackets ‘< … >’)
Each value is derived from an appropriate domain.
A row in the CUSTOMER relation is a 4-tuple and would consist of four values, for example: <632895, "John Smith", "101 Main St. Atlanta,
GA 30332", "(404) 894-2000"> This is called a 4-tuple as it has 4 values A tuple (row) in the CUSTOMER relation.
A relation is a set of such tuples (rows) 8
Chapters 5-8
FORMAL DEFINITIONS - DOMAIN A domain has a logical definition:
Example: “USA_phone_numbers” are the set of 10 digit phone numbers valid in the U.S.
A domain also has a data-type or a format defined for it. The USA_phone_numbers may have a format: (ddd)ddd-
dddd where each d is a decimal digit. Dates have various formats such as year, month, date
formatted as yyyy-mm-dd, or as dd mm,yyyy etc.
The attribute name designates the role played by a domain in a relation: Used to interpret the meaning of the data elements
corresponding to that attribute Example: The domain Date may be used to define two
attributes named “Invoice-date” and “Payment-date” with different meanings
9
Chapters 5-8
FORMAL DEFINITIONS - STATE The relation state is a subset of the
Cartesian product of the domains of its attributeseach domain contains the set of all
possible values the attribute can take. Example: attribute Cust-name is defined over
the domain of character strings of maximum length 25dom(Cust-name) is varchar(25)
The role these strings play in the CUSTOMER relation is that of the name of a customer.
10
Chapters 5-8
FORMAL DEFINITIONS - SUMMARY Formally,
Given R(A1, A2, .........., An) r(R) dom (A1) X dom (A2) X ....X dom(An)
R(A1, A2, …, An) is the schema of the relation R is the name of the relation A1, A2, …, An are the attributes of the relation r(R): a specific state (or "value" or “population”) of
relation R – this is a set of tuples (rows) r(R) = {t1, t2, …, tn} where each ti is an n-tuple ti = <v1, v2, …, vn> where each vj element-of
dom(Aj)
11
Chapters 5-8
FORMAL DEFINITIONS - EXAMPLE Let R(A1, A2) be a relation schema:
Let dom(A1) = {0,1} Let dom(A2) = {a,b,c}
Then: dom(A1) X dom(A2) is all possible combinations:{<0,a> , <0,b> , <0,c>, <1,a>, <1,b>, <1,c> }
The relation state r(R) dom(A1) X dom(A2) For example: r(R) could be {<0,a> , <0,b> , <1,c> }
this is one possible state (or “population” or “extension”) r of the relation R, defined over A1 and A2.
It has three 2-tuples: <0,a> , <0,b> , <1,c> 12
Chapters 5-8
DEFINITION SUMMARY
13
Chapters 5-8
Informal Terms Formal Terms
Table Relation
Column Header Attribute
All possible Column Values
Domain
Row Tuple
Table Definition Schema of a Relation
Populated Table State of the Relation
SUPER KEY: AN ATTRIBUTE OR A SET OF ATTRIBUTES THAT IDENTIFIES AN ENTITY UNIQUELY (MAY NOT BE MINIMAL SET) SSN SSN, NAME SSN, NAME, MAJOR
14
Chapters 5-8
CANDIDATE KEY: A SUPER KEY SUCH THAT NO PROPER SUBSET OF ITS ATTRIBUTES IS ITSELF A SUPER KEY. SO CANDIDATE KEYS MUST HAVE A MINIMAL IDENTIFIER.
STUIDSSN
PRIMARY KEY: THE CANDIDATE KEY THAT IS CHOSENOR THE CANDIDATE KEY THAT IS USED TO IDENTIFY TUPLES IN A RELATION
-- UNIQUE, MUST EXIST ALTERNATE KEY: A CANDIDATE KEY IN A RELATION THAT IS NOT SELECTEDE.G. IF PRIMARY KEY IS SSN THEN STUID IS A ALTERNATE KEY 15
Chapters 5-8
Chapters 5-8 16
CONCATENATED (COMPOSITE) KEY: A PRIMARY KEY THAT IS COMPRISED OF TWO OR MORE ATTRIBUTES OR DATA ITEMS
G RADE_REPORT(STUID, COURSE#, GRADE)
FOREIGN KEY: A NON-KEY ATTRIBUTE IN ONE RELATION THAT APPEARS AS THE PRIMARY KEY (OR PART OF THE KEY) IN ANOTHER RELATION
EMPLOYEE(SSN, FNAME, MINIT, DNO)
DEPARTMENT(DNUMBER, DNAME, MANAGER)
17
Chapters 5-8
SECONDARY KEY: A FIELD THAT CAN HAVE DUPLICATE VALUES, AND THAT CAN BE USED AS SEARCH PATH BY THE USERS
18
Chapters 5-8
Chapters 5-8
19
Chapters 5-8
20
Referential Integrity Constraints for COMPANY database
Chapters 5-8
21
RELATIONAL ALGEBRA OVERVIEW Relational algebra is the basic set of operations
for the relational model These operations enable a user to specify
basic retrieval requests (or queries) The result of an operation is a new relation,
which may have been formed from one or more input relations This property makes the algebra “closed” (all
objects in relational algebra are relations)
22
Chapters 5-8
RELATIONAL ALGEBRA OVERVIEW (CONTINUED) The algebra operations thus produce new
relations These can be further manipulated using
operations of the same algebra A sequence of relational algebra operations
forms a relational algebra expressionThe result of a relational algebra
expression is also a relation that represents the result of a database query (or retrieval request)
23
Chapters 5-8
RELATIONAL ALGEBRA OVERVIEW Relational Algebra consists of several groups of
operations Unary Relational Operations
SELECT (symbol: (sigma)) PROJECT (symbol: (pi))
Relational Algebra Operations From Set Theory UNION ( ), INTERSECTION ( ), DIFFERENCE (or MINUS,
– ) CARTESIAN PRODUCT ( x )
Binary Relational Operations JOIN (several variations of JOIN exist) DIVISION
Additional Relational Operations OUTER JOINS, OUTER UNION AGGREGATE FUNCTIONS (These compute summary of
information: for example, SUM, COUNT, AVG, MIN, MAX)
24
Chapters 5-8
Unary Relational Operations: SELECT
The SELECT operation (denoted by (sigma)) is used to select a subset of the tuples from a relation based on a selection condition.The selection condition acts as a filterKeeps only those tuples that satisfy the
qualifying conditionTuples satisfying the condition are selected
whereas the other tuples are discarded (filtered out)
Examples: Select the EMPLOYEE tuples whose
department number is 4: DNO = 4 (EMPLOYEE)
Select the employee tuples whose salary is greater than $30,000:
SALARY > 30,000 (EMPLOYEE)25
Chapters 5-8
UNARY RELATIONAL OPERATIONS: SELECT
In general, the select operation is denoted by <selection condition>(R) where the symbol (sigma) is used to denote the select
operator the selection condition is a Boolean (conditional)
expression specified on the attributes of relation R tuples that make the condition true are selected
appear in the result of the operation tuples that make the condition false are filtered out
Because of commutativity property, a cascade (sequence) of SELECT operations may be applied in any order: <cond1>(<cond2> (<cond3> (R)) = <cond2> (<cond3> (<cond1> (
R))) A cascade of SELECT operations may be replaced
by a single selection with a conjunction of all the conditions: <cond1>(< cond2> (<cond3>(R)) = <cond1> AND < cond2> AND <
cond3>(R))) The number of tuples in the result of a SELECT
is less than (or equal to) the number of tuples in the input relation R
27
Chapters 5-8
Select Works on single table and takes rows that meet a specified condition, copy them into a new table
(Table name) Condition(s)
SQL (Structured Query language)
SELECT * FROM (table name) WHERE condition 1 AND condition 2 AND condition 3…
28
Chapters 5-8
29
Chapters 5-8
Table
Condition(s)
Find employees who work for department number 5.
employee DNO = 5
SQL:SELECT * FROM employeeWHERE dno = 5;
30
Chapters 5-8
Chapters 5-8
31
Chapters 5-8
32
Employee
DNO=5
Query tree
33
Chapters 5-8
s(DNO=4 AND SALARY>25000) OR (DNO=5 AND SALARY>30000)(EMPLOYEE)
s<cond1>(s<cond2>(. . .(s<condn> (R)) . . .)) = s<cond1> AND <cond2> AND . . .
AND <condn>(R)
Project Operates on a single table,
produces a vertical subset of the table, extract the values of specified columns
eliminate duplicate rows place the value in a new table
(table name)
column1, column2, column3, …
34
Chapters 5-8
SQL: SELECT column1, column2, column3, … FROM (table name)
35
Chapters 5-8
Chapters 5-8
36
Table
column(s)
E.g. Show the names of all employees
employee fname, minit, lname
SELECT fname, minit, lname FROM employee;
37
Chapters 5-8
Chapters 5-8
38
Chapters 5-8
39
Employee
fname,minit,lname
Select & project
Show the names of all employees who work for department number 5
( employee)
fname, minit, lname dno = 5
SELECT fname, minit, lname FROM employee WHERE dno = 5;
40
Chapters 5-8
Chapters 5-8
41
Chapters 5-8
42
Employee
fname,minit,lname
DNO = 5
EXAMPLES OF APPLYING SELECT AND PROJECT OPERATIONS
43
Chapters 5-8
PRODUCT (or Cartesian product) R1 x R2
R1 X R2 is a table where width is the width of R1 plus the width of R2 and whose columns are the columns of R1 followed by the columns of R2
If R1 has X rows and M columnsR2 has Y rows and N columns
R1 X R2 = X * Y rows and M + N columns
44
Chapters 5-8
StudentID Fname Lname
101 Jim Smith102 Tim Brown103 Babara Houston
Credit_HoursSTUID Hours
101 60102 85
Chapters 5-8 45
Student X Credit_HoursID Fname Lname Stuid Hours
101 Jim Smith 101 60101 Jim Smith 102 85102 Tim Brown 101 60102 Tim Brown 102 85103 Babara Houston 101 60103 Babara Houston 102 85
Cartesian Product
QUERY TREE FOR CARTESIAN PRODUCT
46
Chapters 5-8
Table1 Table2
X
47
Chapters 5-8
Example of Query Tree
Theta JoinThe result of performing a SELECT operation using a comparison operator theta (=,<, <=, >, <=, <>) on the product
48
Chapters 5-8
Credit_HoursSTUID Hours
101 60102 85
Chapters 5-8 49
StudentID Fname Lname
101 Jim Smith102 Tim Brown103 Babara Houston
Student X Credit_Hours (ID > STUID)ID Fname Lname Stuid Hours
101 Jim Smith 101 60101 Jim Smith 102 85102 Tim Brown 101 60102 Tim Brown 102 85103 Babara Houston 101 60103 Babara Houston 102 85
Theta Join (>)
StudentID Fname Lname
101 Jim Smith102 Tim Brown103 Babara Houston
Credit_HoursSTUID Hours
101 60102 85103 50
Student X Credit_Hours (ID > STUID)ID Fname Lname Stuid Hours
102 Tim Brown 101 60103 Babara Houston 101 60
Chapters 5-8 50
Theta Join (ID>STUID)
QUERY TREE FOR THETA JOIN
51
Chapters 5-8
Student Credit_Hours
X ID > STUID
Equijoin Product with “theta” is equality
52
Chapters 5-8
Chapters 5-8 53
Student X Credit_Hours (ID = STUID)ID Fname Lname Stuid Hours
101 Jim Smith 101 60101 Jim Smith 102 85102 Tim Brown 101 60102 Tim Brown 102 85103 Babara Houston 101 60103 Babara Houston 102 85
Equijoin
StudentID Fname Lname
101 Jim Smith102 Tim Brown103 Babara Houston
Credit_HoursSTUID Hours
101 60102 85
StudentID Fname Lname
101 Jim Smith102 Tim Brown103 Babara Houston
Credit_HoursSTUID Hours
101 60102 85
Chapters 5-8 54
Student X Credit_Hours (ID = STUID)ID Fname Lname Stuid Hours
101 Jim Smith 101 60102 Tim Brown 102 85
Equijoin
QUERY TREE FOR EQUIJOIN
55
Chapters 5-8
Student Credit_Hours
X ID = STUID
Natural Join |X| Is an equijoin which the repeated column is
eliminated
Usually join performs over column with the same names
56
Chapters 5-8
StudentID Fname Lname
101 Jim Smith102 Tim Brown103 Babara Houston
Credit HoursID Hours
101 60102 85103 50
Chapters 5-8 57
Student X Credit_Hours (ID = STUID)ID Fname Lname Stuid Hours
101 Jim Smith 101 60101 Jim Smith 102 85101 Jim Smith 103 50102 Tim Brown 101 60102 Tim Brown 102 85102 Tim Brown 103 50103 Babara Houston 101 60103 Babara Houston 102 85103 Babara Houston 103 50
Remove
Equi-join
StudentID Fname Lname
101 Jim Smith102 Tim Brown103 Babara Houston
Credit_HoursSTUID Hours
101 60102 85103 50
Chapters 5-8 58
Student X Credit_Hours (ID = STUID)ID Fname Lname Stuid Hours
101 Jim Smith 101 60102 Tim Brown 102 85103 Babara Houston 103 50
Remove this column
StudentID Fname Lname
101 Jim Smith102 Tim Brown103 Babara Houston
Credit_HoursSTUID Hours
101 60102 85103 50
Chapters 5-8 59
Student |X| Credit_Hours ID Fname Lname Hours
101 Jim Smith 60102 Tim Brown 85103 Babara Houston 50
QUERY TREE FOR NATURAL JOIN
60
Chapters 5-8
Student Credit_Hours
|X|
Semi-join: If R1 and R2 are tables
Semijoin of R1 and R2 is natural join of R1 and R2 and then projecting the result into the attributes of A
Semijoin is not cumulative
61
Chapters 5-8
Create tablescreate table student1
(id char(3) primary key, fname char(10), lname char(10));
insert into student1 values(‘101’,’Jim’,’Smith’);insert into student1 values(‘102’,’Tim’,’Brown’);insert into student1 values(‘103’,’Babara’,’Houston’);
insert into credit_hours values(101,60);insert into credit_hours values(102,85);
62
Chapters 5-8
StudentID Fname Lname
101 Jim Smith102 Tim Brown103 Babara Houston
Credit_HoursSTUID Hours
101 60102 85
Chapters 5-8 63
Student |X Credit_Hours ID Fname Lname
101 Jim Smith102 Tim Brown
Left Semi-Join
StudentID Fname Lname
101 Jim Smith102 Tim Brown103 Babara Houston
Credit_HoursSTUID Hours
101 60102 85104 100
Chapters 5-8 64
Right Semi-Join
Student X| Credit_Hours ID Hours
101 60102 85
Outer Join:
Is an extension of a THETA JOIN, an EQUIJOIN, or a NATURAL JOIN
An outer join consists of all rows that appear in the usual theta join, plus an additional row for each of the tuples from the original tables that do not participate in the theta join.
In those rows that are unmatched original tuples, extend it by assigning null values to the other attributes.
65
Chapters 5-8
Left outer join unmatched rows from the first (left) table appear in the resulting table
Right outer join unmatched rows from the second (right) table appear in the resulting table
66
Chapters 5-8
StudentID Fname Lname
101 Jim Smith102 Tim Brown103 Babara Houston
Credit_HoursSTUID Hours
101 60102 85104 100
Chapters 5-8 67
Student |X| Credit_Hours ID Fname Lname Hours
101 Jim Smith 60102 Tim Brown 85
Student |X| Credit_Hours ID Fname Lname Hours
101 Jim Smith 60102 Tim Brown 85103 Babara Houston
Left Outer Join Right Outer Join
Student |X| Credit_Hours ID Fname Lname Stuid Hours
101 Jim Smith 101 60102 Tim Brown 102 85
104 100
Outer Join -- OracleLeft-outer join
select * from student, credit_hours where id = stuid(+);
Intersection () The intersection of two relations is the set of tuples that belong to both relations simultaneously.
76
Chapters 5-8
Student1ID Fname Lname
101 Jim Smith102 Tim Brown103 Babara Houston
Student2ID Fname Lname
101 Jim Smith102 Tim Brown105 Kim Lee110 Mike Moore
Chapters 5-8 77
Intersection
Student1 Student2ID Fname Lname
101 Jim Smith102 Tim Brown
Division () A binary operation that can be defined on two relations where the entire structure of one (the divisor) is a portion of the structure of the other (the dividen)