CS 542 Database Management Systems Relational Database Programming J Singh January 24, 2011
CS 542 Database Management Systems
Relational Database Programming
J Singh
January 24, 2011
2© J Singh, 2011 2
Simple SQL Queries (p1)
Relation BROWSER_TABLE
SELECT * FROM BROWSER_TABLE WHERE ENGINE = 'Gecko'
Browser Engine Platform Engine Version
Internet Explorer 6 Trident Win 98+ 6Internet Explorer 7 Trident Win XP SP2+ 7
AOL browser (AOL desktop) Trident Win XP 6Firefox 3.0 Gecko Win 2k+ / OSX.3+ 1.9
Camino 1.5 Gecko OSX.3+ 1.8Netscape Browser 8 Gecko Win 98SE+ 1.7
Netscape Navigator 9 Gecko Win 98+ / OSX.2+ 1.8
Browser Engine Platform Engine Version
Firefox 3.0 Gecko Win 2k+ / OSX.3+ 1.9Camino 1.5 Gecko OSX.3+ 1.8
Netscape Browser 8 Gecko Win 98SE+ 1.7Netscape Navigator 9 Gecko Win 98+ / OSX.2+ 1.81. Start with the Relation
2. Select () Rows
3© J Singh, 2011 3
Simple SQL Queries (p2)
Relation BROWSER_TABLE
SELECT BROWSER, PLATFORM FROM BROWSER_TABLE
WHERE ENGINE = 'Gecko'
Browser Engine Platform Engine Version
Internet Explorer 6 Trident Win 98+ 6Internet Explorer 7 Trident Win XP SP2+ 7
AOL browser (AOL desktop) Trident Win XP 6Firefox 3.0 Gecko Win 2k+ / OSX.3+ 1.9
Camino 1.5 Gecko OSX.3+ 1.8Netscape Browser 8 Gecko Win 98SE+ 1.7
Netscape Navigator 9 Gecko Win 98+ / OSX.2+ 1.8
Browser Platform
Firefox 3.0 Win 2k+ / OSX.3+Camino 1.5 OSX.3+
Netscape Browser 8 Win 98SE+Netscape Navigator 9 Win 98+ / OSX.2+
1. Start with the Relation2. Select () Rows3. Project () Columns
4© J Singh, 2011 4
Simple SQL Queries (p3)
Relation BROWSER_TABLE
SELECT BROWSER, PLATFORM AS OS FROM BROWSER_TABLE
WHERE ENGINE = 'Gecko'
Browser Engine Platform Engine Version
Internet Explorer 6 Trident Win 98+ 6Internet Explorer 7 Trident Win XP SP2+ 7
AOL browser (AOL desktop) Trident Win XP 6Firefox 3.0 Gecko Win 2k+ / OSX.3+ 1.9
Camino 1.5 Gecko OSX.3+ 1.8Netscape Browser 8 Gecko Win 98SE+ 1.7
Netscape Navigator 9 Gecko Win 98+ / OSX.2+ 1.8
Browser OS
Firefox 3.0 Win 2k+ / OSX.3+Camino 1.5 OSX.3+
Netscape Browser 8 Win 98SE+Netscape Navigator 9 Win 98+ / OSX.2+
1. Start with the Relation2. Select () Rows3. Project () Columns4. Rename () Columns
5© J Singh, 2011 5
SQL Conditions
• In WHERE clause:
– String1 = String2, String1 > String2 and other
comparison operators
• Comparisons are controlled by „collations‟, e.g.,
– COLLATE Latin1_General_CI_AS (Latin1 collation, case insensitive, accent sensitive)
• For other available collations, check your database
• Collations can be specified at three levels
– For the entire database
– For an attribute during in CREATE TABLE
– In the WHERE clause
– LIKE String (pattern matching), e.g.,
• 'John Wayne' LIKE 'John%'
• 'John Wayne' LIKE ‘% W_yne'
6© J Singh, 2011 6
SQL Special Data Types (p1)
• Dates and Times (look them up)
• NULL values ( in Relational Algebra)
– Can mean one of three things:
• Value is unknown
• Value is inapplicable (e.g., spouse name for a single person)
• Value not shown – perhaps because of security concerns
– Regardless of the cause, NULL can not be treated as a constant
• Operations with NULLs
– NULL + number NULL
– NULL number NULL
– NULL = NULL UNKNOWN
– X IS NULL TRUE or FALSE (depending on X)
– NULL 0
– NULL - NULL
NULL
NULL
7© J Singh, 2011 7
SQL Special Data Types (p2)
• UNKNOWN values
– Result from comparison with NULLs
– Other comparisons yield TRUE or FALSE
• UNKNOWN means neither TRUE nor FALSE
– Operations when combined with other logical values
• UNKNOWN AND TRUE UNKNOWN
• UNKNOWN AND FALSE FALSE
• UNKNOWN OR TRUE TRUE
• UNKNOWN OR FALSE UNKNOWN
• NOT UNKNOWN UNKNOWN
8© J Singh, 2011 8
Ordering Results
Relation BROWSER_TABLE
SELECT BROWSER, PLATFORM FROM BROWSER_TABLE
WHERE ENGINE = 'Gecko' ORDER BY ENGINE_VERSION, BROWSER
Browser Engine Platform Engine Version
Internet Explorer 6 Trident Win 98+ 6Internet Explorer 7 Trident Win XP SP2+ 7
AOL browser (AOL desktop) Trident Win XP 6Firefox 3.0 Gecko Win 2k+ / OSX.3+ 1.9
Camino 1.5 Gecko OSX.3+ 1.8Netscape Browser 8 Gecko Win 98SE+ 1.7
Netscape Navigator 9 Gecko Win 98+ / OSX.2+ 1.8
Browser Platform
Netscape Browser 8 Win 98SE+Camino 1.5 OSX.3+
Netscape Navigator 9 Win 98+ / OSX.2+Firefox 3.0 Win 2k+ / OSX.3+
1. Start with the Relation2. Select () Rows3. Order Rows4. Project () Columns
9© J Singh, 2011 9
Detour: World Database
• A sample MySQL database downloadable from the web
• Has 3 tables: City, Country, CountryLanguage
– City
• ID, Name, CountryCode, District, Population
– Country
• Code, Name, Continent, Region, SurfaceArea, IndepYear, Population, LifeExpectancy, GNP, GNPOld, LocalName, GovernmentForm, HeadOfState, Capital, Code2
– CountryLanguage
• CountryCode, Language, IsOfficial, Percentage
– The three tables are „connected‟ by the CountryCode attribute.
10© J Singh, 2011 10
Joins
• Find all cities in Estonia
SELECT City.Name
FROM City, Country
WHERE Country.Name = 'Estonia'
AND City.CountryCode = Country.Code ;
• Find all countries where Dutch is the official language
SELECT Country.Name
FROM Country, CountryLanguage
WHERE CountryLanguage.CountryCode = Country.Code
AND CountryLanguage.Language = 'Dutch'
AND CountryLanguage.isOfficial = 'T' ;
11© J Singh, 2011 11
Join Semantics – Nested Loops
• Find all cities in Estonia
SELECT City.Name FROM City, Country
WHERE Country.Name = 'Estonia‟
AND City.CountryCode = Country.Code
• Is equivalent to
For each tuple t1 in City:
For each tuple t2 in Country:
If the WHERE clause is satisfied:
Accumulate <t1, t2> into a result set
Project City.Name from the accumulated result set
12© J Singh, 2011 12
Join Semantics – Relational Algebra
• Find all cities in Estonia
SELECT City.Name
FROM City, Country
WHERE Country.Name = 'Estonia'
AND City.CountryCode = Country.Code
• Is equivalent to
A1( B1='Estonia' AND A2= B2
(A B) )
Where A = City, B = Country,
A1 = City.Name, A2 = City.CountryCode, A3 = Country.Code
13© J Singh, 2011 13
Self-Joins
• Find all districts in Kenya that have more than one city
SELECT distinct c1.district
FROM city c1, city c2, country
WHERE c1.name != c2.name
AND country.code = c1.countrycode
AND country.code = c2.countrycode
AND country.name = 'kenya';
– The same table (city) gets used with two names, c1 and c2
14© J Singh, 2011 14
Set Operators
• Find all districts in Kenya that have exactly one city
( SELECT distinct city.district
FROM city, country
WHERE country.code = city.countrycode
AND country.name = 'kenya' )
EXCEPT
( SELECT distinct c1.district
FROM city c1, city c2, country
WHERE c1.name != c2.name
AND country.code = c1.countrycode
AND country.code = c2.countrycode
AND country.name = 'kenya' );
• Both sides must yield the same tuples
Or UNION or INTERSECT
15© J Singh, 2011 15
Subqueries
• A different way to structure queries (without using joins)
SELECT ___________________
FROM _____Subquery 3____
WHERE _____Subquery 1____
_____Subquery 2____
16© J Singh, 2011 16
Subqueries Returning Scalars
• Find all cities in Estonia
SELECT City.Name
FROM City, Country
WHERE Country.Name = 'Estonia'
AND City.CountryCode = Country.Code
• Can also be written as
SELECT Name
FROM City
WHERE CountryCode =
(SELECT Code FROM Country WHERE Name = 'Estonia')
• The two forms are equivalent except when…
17© J Singh, 2011 17
Conditions Returning Relations
• Find all countries where Dutch is the official language
SELECT Country.Name
FROM Country, CountryLanguage
WHERE CountryLanguage.CountryCode = Country.Code
AND CountryLanguage.Language = 'Dutch'
AND isOfficial = 'T' ;
• Can also be written as
SELECT Name FROM Country
WHERE Code IN
( SELECT CountryCode IN CountryLanguage
WHERE Language = 'Dutch' AND isOfficial = 'T' );
18© J Singh, 2011 18
Conditions Returning Tuples
• Find all countries where Dutch is the official language
SELECT Name FROM Country
WHERE Code IN
( SELECT CountryCode IN CountryLanguage
WHERE Language = 'Dutch' AND isOfficial = 'T' );
• Can also be written as
SELECT Name FROM Country
WHERE (Code, 'T') IN
( SELECT CountryCode, isOfficial FROM CountryLanguage
WHERE Language = 'Dutch' );
19© J Singh, 2011 19
Subqueries in FROM clauses
• Total population of all countries with Dutch as the official language
SELECT Name FROM Country
WHERE Code IN
( SELECT CountryCode IN CountryLanguage
WHERE Language = 'Dutch' AND isOfficial = 'T' );
20© J Singh, 2011 20
Cross Joins
• Populations of cities in Finland relative to Aruba & Singapore
SELECT
city.name as City,
city.population as Population,
cntry.name as Country,
(city.population * 100 / cntry.population) as 'Percent'
FROM
(SELECT * FROM CITY WHERE CountryCode = 'fin') AS city
CROSS JOIN
(SELECT * FROM Country WHERE Code='abw' OR Code=‘sgp')
AS cntry;
21© J Singh, 2011 21
Theta Joins
• Cross Join with a condition
– The most common form of JOIN
• All cities in Finland with a population at least double of Aruba
SELECT
cty.name as City,
cty.population as Population,
cntry.name as Country,
(cty.population * 100 / cntry.population) as 'Percent'
FROM
( SELECT * FROM City WHERE CountryCode = 'fin') AS cty
JOIN (SELECT * FROM Country WHERE Code='abw') AS cntry
ON cty.population > 2*cntry.population;
22© J Singh, 2011 22
Outer Joins
• Selecting elements of a table regardless of whether they are present in the other table.
• Cities starting with 'TOK' and countries starting with 'J'
SELECT c.*, r.name as Country
FROM
(select * from city where city.name like 'tok%') as c
LEFT OUTER JOIN
(select * from country where country.code like 'j%') as r
ON (c.countrycode=r.code);
• Yields 6 cities, 5 in Japan and Tokat in Turkey
• What if we had done RIGHT OUTER JOIN?
23© J Singh, 2011 23
Review and Contrast Joins
• MySQL does not implement FULL OUTER JOIN
– How can we get it if we need it?
• Are CROSS JOIN and FULL OUTER JOIN the same thing?
• Table A has 3 rows, table B has 5 rows.
– How many rows does A CROSS JOIN B have?
– How many rows does A LEFT OUTER JOIN B have?
– How about A RIGHT OUTER JOIN B?
– A FULL OUTER JOIN B?
– A INNER JOIN B?
24© J Singh, 2011 24
Reading Assignment
• Section 6.4
• Section 6.5
– Keep timing considerations in mind
• SQL completely evaluates the query before affecting changes
25© J Singh, 2011 25
Transactions
• ACID
– Atomicity
• Sets of database operations that need to be accomplished atomically, either they all get done or none do. E.g., during money transfer,
– If money is taken out of one account, it must be added to the other
– Consistency
• Enforce constraints on types, values, foreign keys
• Maintain relationships among data elements (see Atomicity)
– Isolation
• Each transaction must appear to be executed as if no other transaction is executing at the same time.
– Durability
• Once committed, the change is permanent.
26© J Singh, 2011 26
Detour: Transaction Scenario
• Real Time Bank (RTB) is an on-line bank.
– RTB executes money transfers as soon as requests are entered
– RTB shows up-to-the-minute account balances
– Transactions that would create a negative balances are denied
• Scenario
– Initially, Alice has $250, Bob has $100, Cathy has $150
– Transactions:
1. Alice pays Bob $200
2. Bob pays Cathy $150
3. Cathy pays Alice $250
• Interesting aside: only transaction order 1, 2, 3 will succeed
– At a Nightly Processing Bank, transaction order would be irrelevant
27© J Singh, 2011 27
Transaction Atomicity
• Work by example: Alice pays Bob $200
BEGIN TRANSACTION
UPDATE Accounts
SET balance = balance – 200
WHERE Owner = 'Alice'
IF (0 > SELECT balance FROM Accounts WHERE Owner = 'Alice‘,
ROLLBACK TRANSACTION ) -- Note: Pidgin SQL Syntax
UPDATE Accounts
SET balance = balance + 200
WHERE Owner = 'Bob‘
COMMIT TRANSACTION
28© J Singh, 2011 28
Transaction Isolation
• Isolation levels and the problems they leave behind:
– READ UNCOMMITTED
• Dirty Read – data of an uncommitted transaction visible to others
– READ COMMITTED: only committed data is visible
• Non-repeatable Read – re-reads some data and find that it has changed due to another transaction committing
– REPEATABLE READ: place locks on all data that are used in the transaction
• Phantom Read – re-execute a subquery returning a set of rows and find a different set of rows
– SERIALIZABLE: As if all transactions occur in a completely isolated fashion
• Too restrictive, not able to support enough transaction volume
• Note: Not every database offers each isolation level.
Choose the isolation level with care!
CS 542 Database Management Systems
Database Logic – The Foundation of Datalog
30© J Singh, 2011 30
About Datalog
• Intellectual debt to Prolog, the logic programming language
• Responsible for addition of recursion to SQL-99.
– Extends SQL but still leaves it Turing-incomplete
• Introductory example:
– Facts:
• Par(sally, john), Par(martha, mary), Par(mary, peter), Par(john, peter)
– Rules:
• Sib(x, y) Par(x, p) AND Par(y, p) AND x <> y
• Cousin(x, y) Sib(x, y)
• Cousin(x, y) Par(x, xp) AND Par(y, yp) AND Cousin(xp, yp)
– Cousin(sally, martha)
31© J Singh, 2011 31
Why Data Logic?
• Why is SQL not sufficient?
– Deductive rules express things that go in both FROM and WHERE clauses
– Allow for stating general requirements that are more difficult to state correctly in SQL
– Allow us to take advantage of research in logic programming and AI
32© J Singh, 2011 32
The Formalism of Rules
• The Head is true if all the subgoals are true
• The rule applies for all values of its arguments
• A variable appearing in the head is distinguished ; otherwise it is nondistinguished.
Ancestor(x, y)
Head = consequent,a single subgoal
Read thissymbol “if”
Body = antecedent =AND of subgoals.
Parent(x, z) AND Ancestor(z, y)
33© J Singh, 2011 33
Interpreting Rules
• The head is true for given values of the distinguished variables if there exist values of the non-distinguished variables that make all subgoals of the body true.
• For the head to be true, all variables must appear in some non-negated subgoal of the body
• Unsafe examples:
34© J Singh, 2011 34
IDB/EDB
• Convention: Predicates begin with a capital, variables begin with lowercase
– e.g., Ancestor (x, y)
• Fact predicates are atoms represented as relations
– If a tuple exists, that fact is true
– Otherwise, false
– A predicate representing a stored relation is called an extensional database (EDB).
• Subgoals of a rule may be facts or may themselves be rules
– EDB when it is a fact
– Intensional database (IDB) when it is a “derived relation”
• Rule heads are always IDBs
father
john tony
peter mary
mother
mary bob
35© J Singh, 2011 35
Computing IDB Relations Bottom-up
• As long as there is no negation of IDB subgoals, each IDB relation grows with each iteration
– At least, it does not shrink
• Since relations are finite, the loop eventually terminates
• Some rules make it impossible to predict that the loop has a chance to terminate.
– Considered unsafe
empty out all IDB relations
REPEAT
FOR (each IDB predicate p) DO
evaluate p using current
values of all relations;
UNTIL (no IDB relation is changed)
Rule Why unsafe?
isHappy(x)
isRich(y)We know y but the possibilities for x are infinite
Bachelor(x) NOT
isMarried(x)Negated, may remove x
IsCheap(x) x < 10 Infinite possibilities
36© J Singh, 2011 36
Computing IDB Relations Top-Down (p1)
• EDB: Par(c,p) = p is a parent of c.
• Generalized cousins: people with common ancestors one or more generations back:
Sib(x,y) <- Par(x,p) AND Par(y,p) AND x<>y
Cousin(x,y) <- Sib(x,y)
Cousin(x,y) <- Par(x,xp) AND Par(y,yp)
AND Cousin(xp,yp)
• Form a dependency graph whose nodes = IDB predicates.
• Arc X ->Y if and only if there is a rule with X in the head and Y in the body.
• Cycle = recursion; no cycle = no recursion.
37© J Singh, 2011 37
Computing IDB Relations Top-down (p2)
• The recursion eventually terminates unless:
– A distinguished variable
1. does not appear in a subgoal
2. only appears in a negated subgoal
3. only appears in an arithmetic subgoal
– Same 3 conditions for variables in an arithmetic subgoal
– Same 3 conditions for variables in a negated subgoal
for IDB predicate p(x,y, …)
FOR EACH subgoal of p DO
IF subgoal is IDB, recursive call;
IF subgoal is EDB, look up
Rule Why unsafe?
isHappy(x)
isRich(y)We know x but the possibilities for y are infinite
Bachelor(x) NOT
isMarried(x)Negated, may resultin infinite recursion
IsCheap(x) x < 10 It‟s safe!
38© J Singh, 2011 38
Safe Rules
• A rule is safe if:
1. Each distinguished variable,
2. Each variable in an arithmetic subgoal, and
3. Each variable in a negated subgoal,
also appears in a nonnegated,
relational subgoal.
• Safe rules prevent infinite results.
39© J Singh, 2011 39
Evaluating Datalog Programs
• As long as there is no recursion, we can pick an order to evaluate the IDB predicates, so that all the predicates in the body of its rules have already been evaluated.
• If an IDB predicate has more than one rule, each rule contributes tuples to its relation.
40© J Singh, 2011 40
Expressive Power of Datalog
• Without recursion, Datalog can express all and only the queries of core relational algebra.
– The same as SQL select-from-where, without aggregation and grouping.
• But with recursion, Datalog can express more than these languages.
• Yet still not Turing-complete.
41© J Singh, 2011 41
SQL Rule Definitions & Usage
• Definition of Datalog Rules:
WITH
[RECURSIVE] <RuleName> (<arguments>)
AS <query>;
• Invocation of Datalog Rules:
<SQL query about EDB, IDB>
42© J Singh, 2011 42
SQL Recursion Example (p1)
• Find Sally‟s cousins
– Using Recursive definition introduced earlier
– Par (child, parent) is the EDB
• Expected SQL Query
SELECT y
FROM Cousin
WHERE x = ‘Sally’;
• But first, we need to define the IDB Cousin
43© J Singh, 2011 43
SQL Recursion Example (p2)
• WITH Clause (non-recursive)
WITH Sib(x, y) AS
FROM Par p1, Par p2
WHERE p1.parent = p2.parent
AND p1.child <> p2.child;
• WITH Clause (recursive)
RECURSIVE Cousin(x, y) AS
(SELECT * FROM Sib)
UNION
(SELECT p1.child, p2.child
FROM Par p1, Par p2, Cousin
WHERE p1.parent = Cousin.x
AND p2.parent = Cousin.y);
44© J Singh, 2011 44
Next meeting
• January 31
• Sections 7.1 – 7.3
• Sections 8.1, 8.3 – 8.4
• Discussion of presentation topic proposals