Top Banner
Querying Relational Data: Algebra Gerome Miklau UMass Amherst CMPSCI 645 – Database Systems Jan 21, 2010 Some slide content courtesy of Zack Ives, Ramakrishnan & Gehrke, Dan Suciu, Ullman & Widom Thursday, January 21, 2010
37

Querying Relational Data: Algebraavid.cs.umass.edu/courses/645/s2010/lectures/02-RelationalAlg.pdf · Division • A derived operator useful for queries like: Find students who have

Nov 03, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Querying Relational Data: Algebraavid.cs.umass.edu/courses/645/s2010/lectures/02-RelationalAlg.pdf · Division • A derived operator useful for queries like: Find students who have

Querying Relational Data: Algebra

Gerome MiklauUMass Amherst

CMPSCI 645 – Database Systems

Jan 21, 2010

Some slide content courtesy of Zack Ives, Ramakrishnan & Gehrke, Dan Suciu, Ullman & Widom

Thursday, January 21, 2010

Page 2: Querying Relational Data: Algebraavid.cs.umass.edu/courses/645/s2010/lectures/02-RelationalAlg.pdf · Division • A derived operator useful for queries like: Find students who have

Next lectures

• Today– Relational model, relational algebra

• Next Tuesday– SQL

• Homework 1 will be on these topics

Thursday, January 21, 2010

Page 3: Querying Relational Data: Algebraavid.cs.umass.edu/courses/645/s2010/lectures/02-RelationalAlg.pdf · Division • A derived operator useful for queries like: Find students who have

Relational Database: Definitions

• Relational database: a set of relations• Relation: made up of 2 parts:

– Instance : a table, with rows and columns. – Schema : specifies name of relation, plus

name and type/domain of each column.

Restriction: all attributes are of atomic type, no nested tables

Students(sid: string, name: string, login: string, age: integer, gpa: real).

Thursday, January 21, 2010

Page 4: Querying Relational Data: Algebraavid.cs.umass.edu/courses/645/s2010/lectures/02-RelationalAlg.pdf · Division • A derived operator useful for queries like: Find students who have

Relational instances: tablesArity (number of attributes) is 5

Students

column, attribute, field

row, tuple

Attribute value

A relation is a set of tuples: no tuple can occur more than once– Real systems may allow duplicates for efficiency or other

reasons – we’ll come back to this.

Thursday, January 21, 2010

Page 5: Querying Relational Data: Algebraavid.cs.umass.edu/courses/645/s2010/lectures/02-RelationalAlg.pdf · Division • A derived operator useful for queries like: Find students who have

Relational Query Languages• Query languages: Allow manipulation and retrieval

of data from a database.• Query Languages != programming languages!

– QLs not expected to be “Turing complete”.– QLs not intended to be used for complex calculations.– QLs support easy, efficient access to large data sets.

Thursday, January 21, 2010

Page 6: Querying Relational Data: Algebraavid.cs.umass.edu/courses/645/s2010/lectures/02-RelationalAlg.pdf · Division • A derived operator useful for queries like: Find students who have

Preliminaries

• A query is applied to one or more relation instances

• The result of a query is a relation instance.• Input and output schema:

– Schema of input relations for a query are fixed – The schema for the result of a given query is also fixed:

determined by definition of query language constructs.

Query Q: R1..Rn → R’

Thursday, January 21, 2010

Page 7: Querying Relational Data: Algebraavid.cs.umass.edu/courses/645/s2010/lectures/02-RelationalAlg.pdf · Division • A derived operator useful for queries like: Find students who have

What is an “Algebra”

• Mathematical system consisting of:– Operands --- variables or values from

which new values can be constructed.– Operators --- symbols denoting procedures

that construct new values from given values.

Thursday, January 21, 2010

Page 8: Querying Relational Data: Algebraavid.cs.umass.edu/courses/645/s2010/lectures/02-RelationalAlg.pdf · Division • A derived operator useful for queries like: Find students who have

What is the Relational Algebra?

• An algebra whose operands are relations or variables that represent relations.

• Operators are designed to do the most common things that we need to do with relations in a database.– The result is an algebra that can be used

as a query language for relations.

Thursday, January 21, 2010

Page 9: Querying Relational Data: Algebraavid.cs.umass.edu/courses/645/s2010/lectures/02-RelationalAlg.pdf · Division • A derived operator useful for queries like: Find students who have

Relational Algebra• Operates on relations, i.e. sets

– Later: we discuss how to extend this to bags• Five operators:

– Union: ∪– Difference: -– Selection: σ– Projection: Π – Cartesian Product: ×

• Derived or auxiliary operators:– Intersection, complement– Joins (natural, equi-join, theta join)– Renaming: ρ– Division: /

Thursday, January 21, 2010

Page 10: Querying Relational Data: Algebraavid.cs.umass.edu/courses/645/s2010/lectures/02-RelationalAlg.pdf · Division • A derived operator useful for queries like: Find students who have

Example Database

sid name

1 Jill2 Bo3 Maya

fid name

1 Diao2 Saul8 Weems

sid cid

1 6451 6833 635

cid name sem

645 DB F05683 AI S05635 Arch F05

fid cid

1 6452 6838 635

STUDENT Takes COURSE

PROFESSOR Teaches

Thursday, January 21, 2010

Page 11: Querying Relational Data: Algebraavid.cs.umass.edu/courses/645/s2010/lectures/02-RelationalAlg.pdf · Division • A derived operator useful for queries like: Find students who have

1. Union and 2. Difference

sid name1 Jill2 Bo3 Maya

R1 sid name1 Jill4 Bob

R2

sid name2 Bo3 Maya

sid name1 Jill2 Bo3 Maya4 Bob

R1 – R2R1 ∪ R2

Thursday, January 21, 2010

Page 12: Querying Relational Data: Algebraavid.cs.umass.edu/courses/645/s2010/lectures/02-RelationalAlg.pdf · Division • A derived operator useful for queries like: Find students who have

What about Intersection ?

• It is a derived operator• R1 ∩ R2 = R1 – (R1 – R2)• Also expressed as a join (we’ll see

later)

R1 R2 R1 – R2

Thursday, January 21, 2010

Page 13: Querying Relational Data: Algebraavid.cs.umass.edu/courses/645/s2010/lectures/02-RelationalAlg.pdf · Division • A derived operator useful for queries like: Find students who have

3. Selection• Returns all tuples which satisfy a

condition• Notation: σc(R)• Examples

σCID > 600 (Course)σname = “AI” (Course)

• The condition c can be =, <, ≤, >, ≥, <>

cid name sem

645 DB F05683 AI S05635 Arch F05

Course

Thursday, January 21, 2010

Page 14: Querying Relational Data: Algebraavid.cs.umass.edu/courses/645/s2010/lectures/02-RelationalAlg.pdf · Division • A derived operator useful for queries like: Find students who have

4. Projection• Eliminates columns, then removes duplicates• Notation: Π A1,…,An (R)• Example: project cid and name

Π cid, name (Course)Output schema: Answer(cid, name)

cid name sem

645 DB F05683 AI S05645 DB S05

Coursecid name

645 DB683 AI

Answer

Π

Thursday, January 21, 2010

Page 15: Querying Relational Data: Algebraavid.cs.umass.edu/courses/645/s2010/lectures/02-RelationalAlg.pdf · Division • A derived operator useful for queries like: Find students who have

5. Cartesian Product

• Each tuple in R1 with each tuple in R2

• Notation: R1 × R2

• Very rare in practice; mainly used to express joins

Also called “Cross Product”

Thursday, January 21, 2010

Page 16: Querying Relational Data: Algebraavid.cs.umass.edu/courses/645/s2010/lectures/02-RelationalAlg.pdf · Division • A derived operator useful for queries like: Find students who have

Cartesian Product

16

sid cid

1 6451 6833 635

sid name1 Jill2 Bo

Student TakesStudent × Takes

sid name sid cid1 Jill 1 6451 Jill 1 6831 Jill 3 6352 Bo 1 6452 Bo 1 6832 Bo 3 635

Thursday, January 21, 2010

Page 17: Querying Relational Data: Algebraavid.cs.umass.edu/courses/645/s2010/lectures/02-RelationalAlg.pdf · Division • A derived operator useful for queries like: Find students who have

Renaming

• Changes the schema, not the instance• Notation: ρ B1,…,Bn (R)• Example:

ρcourseID, cname, term (Course)

cid name sem

645 DB F05683 AI S05645 DB S05

CoursecourseID cname term645 DB F05683 AI S05645 DB S05

ρ

Thursday, January 21, 2010

Page 18: Querying Relational Data: Algebraavid.cs.umass.edu/courses/645/s2010/lectures/02-RelationalAlg.pdf · Division • A derived operator useful for queries like: Find students who have

Natural Join• Notation: R1 R2

• Meaning: R1 R2 = ΠA(σC(R1 × R2))

• Where:– The selection σC checks equality of all

common attributes– The projection eliminates the duplicate

common attributes

Thursday, January 21, 2010

Page 19: Querying Relational Data: Algebraavid.cs.umass.edu/courses/645/s2010/lectures/02-RelationalAlg.pdf · Division • A derived operator useful for queries like: Find students who have

Natural join example

19

sid name1 Jill2 Bo3 Maya

sid cid

1 6451 6833 635

Takes

Student

sid name cid

1 Jill 6451 Jill 6833 Maya 635

Student Takes

Thursday, January 21, 2010

Page 20: Querying Relational Data: Algebraavid.cs.umass.edu/courses/645/s2010/lectures/02-RelationalAlg.pdf · Division • A derived operator useful for queries like: Find students who have

Example Database

sid name

1 Jill2 Bo3 Maya

fid name

1 Diao2 Saul8 Weems

sid cid

1 6451 6833 635

cid name sem

645 DB F05683 AI S05635 Arch F05

fid cid

1 6452 6838 635

STUDENT Takes COURSE

PROFESSOR Teaches

Thursday, January 21, 2010

Page 21: Querying Relational Data: Algebraavid.cs.umass.edu/courses/645/s2010/lectures/02-RelationalAlg.pdf · Division • A derived operator useful for queries like: Find students who have

Natural join questions

• Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R S ?

• Given R(A, B, C), S(D, E), what is R S ?

• Given R(A, B), S(A, B), what is R S ?

– R(A,B,C,D,E)

– Cartesian Product

– Intersection

Thursday, January 21, 2010

Page 22: Querying Relational Data: Algebraavid.cs.umass.edu/courses/645/s2010/lectures/02-RelationalAlg.pdf · Division • A derived operator useful for queries like: Find students who have

Theta Join

• A join that involves a predicate• R1 θ R2 = σ θ (R1 × R2)

• Here θ can be any condition: =, <, ≠, ≤, >, ≥

Example: Student sid<sid Takes

Thursday, January 21, 2010

Page 23: Querying Relational Data: Algebraavid.cs.umass.edu/courses/645/s2010/lectures/02-RelationalAlg.pdf · Division • A derived operator useful for queries like: Find students who have

Equi-join

• A theta join where θ is an equality• R1 A=B R2 = σ A=B (R1 × R2)• Very useful join in practice

• Example: Student sid=sid Takes

Thursday, January 21, 2010

Page 24: Querying Relational Data: Algebraavid.cs.umass.edu/courses/645/s2010/lectures/02-RelationalAlg.pdf · Division • A derived operator useful for queries like: Find students who have

Semijoin

R S = Π A1,…,An (R S)where A1, …, An are the attributes in R

The semijoin of R and S is the set of tuples of R that agree with at least one tuple of S on all attributes common to the schema of R and S.

sid name1 Jill2 Bo3 Maya

sid cid1 6451 6833 635

TakesStudent

sid name

1 Jill3 Maya

Student Takes

Thursday, January 21, 2010

Page 25: Querying Relational Data: Algebraavid.cs.umass.edu/courses/645/s2010/lectures/02-RelationalAlg.pdf · Division • A derived operator useful for queries like: Find students who have

Division• A derived operator useful for queries like:

Find students who have enrolled in all systems courses.

• Let R have 2 fields, x and y; S have only field y:• R/S = • i.e., R/S contains all x tuples (students) such that for

every y tuple (course) in S, there is an xy tuple in R.• Or: If the set of y values (courses) associated with an x

value (student) in R contains all y values in S, the x value is in R/S.

• In general: attributes of S must be subset of attributes of R: • R(A1 ... An, B1, ... Bm) and S(B1 ... Bm)

{ (x) | ∀ (y) ∈ S, ∃ (x,y) ∈ R }

Thursday, January 21, 2010

Page 26: Querying Relational Data: Algebraavid.cs.umass.edu/courses/645/s2010/lectures/02-RelationalAlg.pdf · Division • A derived operator useful for queries like: Find students who have

Division examplessno pnos1 p1s1 p2s1 p3s1 p4s2 p1s2 p2s3 p2s4 p2s4 p4

pnop2

pnop2p4

pnop1p2p4

snos1s2s3s4

snos1s4

snos1

A

B1B2

B3

A/B1 A/B2 A/B3

Thursday, January 21, 2010

Page 27: Querying Relational Data: Algebraavid.cs.umass.edu/courses/645/s2010/lectures/02-RelationalAlg.pdf · Division • A derived operator useful for queries like: Find students who have

Expressing division using basic operators

Idea: For R/S, compute all x values that are not `disqualified’ by some y value in S. an x value is disqualified if, by attaching y value from S,

we obtain an xy tuple that is not in R.

Disqualified x values: Πx((Πx(R) × S)-R)

R / S: Πx(R) - all disqualified tuples

Thursday, January 21, 2010

Page 28: Querying Relational Data: Algebraavid.cs.umass.edu/courses/645/s2010/lectures/02-RelationalAlg.pdf · Division • A derived operator useful for queries like: Find students who have

Combining operators: complex expressions

Πname,sid (σname=”DB” (Students (Takes Course)))

Students CourseTakes

σname=”DB”

Πname,sid

Thursday, January 21, 2010

Page 29: Querying Relational Data: Algebraavid.cs.umass.edu/courses/645/s2010/lectures/02-RelationalAlg.pdf · Division • A derived operator useful for queries like: Find students who have

Algebraic Equivalences

• Relational algebra has laws of commutativity, associativity, etc. that imply certain expressions are equivalent.

Definition: Query Equivalence

Two queries Q and Q’ are equivalent if:

for all instances D, Q(D) = Q’(D)

Thursday, January 21, 2010

Page 30: Querying Relational Data: Algebraavid.cs.umass.edu/courses/645/s2010/lectures/02-RelationalAlg.pdf · Division • A derived operator useful for queries like: Find students who have

Query OptimizationIs Based on Algebraic Equivalences

• Equivalent expressions may be different in cost of evaluation!

σc ∧ d(R) ≡ σc( σd(R) )

σc (R ⋈ S) ≡ σc(R) ⋈ S

• Query optimization finds the most efficient representation to evaluate (or one that’s not bad)

R ⋈ (S ⋈ T) ≡ (R ⋈ S) ⋈ T)

cascading selection

join associativity

pushing selections

Thursday, January 21, 2010

Page 31: Querying Relational Data: Algebraavid.cs.umass.edu/courses/645/s2010/lectures/02-RelationalAlg.pdf · Division • A derived operator useful for queries like: Find students who have

Operations on BagsA bag = a set with repeated elementsRelational Engines work on bags, not sets !All operations need to be defined carefully on bags• {a,b,b,c}∪{a,b,b,b,e,f,f}={a,a,b,b,b,b,b,c,e,f,f}• {a,b,b,b,c,c} – {b,c,c,c,d} = {a,b,b}• σC(R): preserves the number of occurrences

• ΠA(R): no duplicate elimination

• Cartesian product, join: no duplicate elimination

Thursday, January 21, 2010

Page 32: Querying Relational Data: Algebraavid.cs.umass.edu/courses/645/s2010/lectures/02-RelationalAlg.pdf · Division • A derived operator useful for queries like: Find students who have

Beware: Bag Laws != Set Laws

• Some, but not all algebraic laws that hold for sets also hold for bags.

• Example: the commutative law for union (R ∪ S = S ∪ R ) does hold for bags.– Since addition is commutative, adding the

number of times x appears in R and S doesn’t depend on the order of R and S.

Thursday, January 21, 2010

Page 33: Querying Relational Data: Algebraavid.cs.umass.edu/courses/645/s2010/lectures/02-RelationalAlg.pdf · Division • A derived operator useful for queries like: Find students who have

Example of the Difference

• Set union is idempotent, meaning that S ∪ S = S.

• However, for bags, if x appears n times in S, then it appears 2n times in S ∪ S.

• Thus S ∪ S != S in general.

Thursday, January 21, 2010

Page 34: Querying Relational Data: Algebraavid.cs.umass.edu/courses/645/s2010/lectures/02-RelationalAlg.pdf · Division • A derived operator useful for queries like: Find students who have

Relational calculus

•What is a “calculus”?– The term "calculus" means a system of

computation – The relational calculus is a system of

computing with relations

34

Thursday, January 21, 2010

Page 35: Querying Relational Data: Algebraavid.cs.umass.edu/courses/645/s2010/lectures/02-RelationalAlg.pdf · Division • A derived operator useful for queries like: Find students who have

Relational calculus (in 1 slide)

We will study another logic-based formalism for queries called Datalog later.

Name and sid of students who are taking the course “DB”English:

{xname, xsid | ∃xcid∃xterm Students(xsid,xname) ∧ Takes(xsid,xcid) ∧ Course(xcid,”DB”, xterm) }RC:

RA: Πname,sid (Students Takes σname=”DB” (Course)

Where are the joins?

Thursday, January 21, 2010

Page 36: Querying Relational Data: Algebraavid.cs.umass.edu/courses/645/s2010/lectures/02-RelationalAlg.pdf · Division • A derived operator useful for queries like: Find students who have

Algebra v. Calculus

• Relational Algebra: More operational; very useful for representing execution plans.

• Relational Calculus: More declarative, basis of SQL

• The calculus and algebra have equivalent expressive power (Codd)

A language that can express this core class of queries is called Relationally Complete

Thursday, January 21, 2010

Page 37: Querying Relational Data: Algebraavid.cs.umass.edu/courses/645/s2010/lectures/02-RelationalAlg.pdf · Division • A derived operator useful for queries like: Find students who have

What can’t you express in RA,RC?

• Can I get from Oakland to Boston in 2 flights?

• Can I get from Oakland to Reno?

37

depart arrive

NYC Reno

NYC Oakland

Boston Tampa

Oakland Boston

Tampa NYC

Thursday, January 21, 2010