YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

MIS3150 Data and Information Management

Query Languages –SQL

Arijit Sengupta

Page 2: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Structure of this semester

Database Fundamentals

Relational Model

Normalization

ConceptualModeling Query

Languages

AdvancedSQL

0. Intro 1. Design

Newbie Users ProfessionalsDesigners

Java DB Applications –

JDBC

4. Applications

MIS3150

2. Querying

Developers

Transaction Management

DataMining

3. AdvancedTopics

Page 3: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Today’s Buzzwords

• Query Languages• Formal Query Languages• Procedural and Declarative Languages• Relational Algebra• Relational Calculus• SQL• Aggregate Functions• Nested Queries

Page 4: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Objectives

At the end of the lecture, you should • Get a formal as well as practical perspective on query

languages• Have a background on query language basics (how they

came about)• Be able to write simple SQL queries from the specification• Be able to look at SQL queries and understand what it is

supposed to do• Be able to write complex SQL queries involving nesting• Execute queries on a database system

Page 5: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Set Theory Basics

• A set: a collection of distinct items with no particular order

• Set description: { b | b is a Database Book} {c | c is a city with a population of over a

million} {x | 1 < x < 10 and x is a natural number}

• Most basic set operation:Membership: x S (read as x belongs to S if

x is in the set S)

Page 6: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Other Set Operations

• Addition, deletion (note that adding an existing item in the set does not change it)

• Set mathematics:Union R S = { x | x R or x S} Intersection R S = { x | x R and x S}Set Difference R – S = { x | x R and x S}Cross-product R x S = { <x,y> | x R and y S}

• You can combine set operations much like arithmetic operations: R – (S T)

• Usually no well-defined precedence

Page 7: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Relational Query Languages

• Query languages: Allow manipulation and retrieval of data from a database.

• Relational model supports simple, powerful QLs: Strong formal foundation based on logic. Allows for much optimization.

• Query Languages != programming languages! QLs not expected to be “Turing complete”. QLs not intended to be used for complex calculations. QLs support easy, efficient access to large data sets.

Page 8: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Formal Relational Query Languages

Two mathematical Query Languages form the basis for “real” languages (e.g. SQL), and for implementation:

Relational Algebra: More operational, very useful for representing execution plans.

Relational Calculus: Lets users describe what they want, rather than how to compute it. (Non-operational, declarative.)

Understanding Algebra & Calculus is key to understanding SQL, query processing!

Page 9: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Structured Query Language

• Need for SQLOperations on Data TypesDefinition ManipulationOperations on SetsDeclarative (calculus) vs. Procedural (algebra)

• Evolution of SQLSEQUEL ..SQL_92 .. SQL_93SQL Dialects

Does SQL treat Relations as ‘Sets’?

Page 10: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Preliminaries

• A query is applied to relation instances, and the result of a query is also a relation instance. Schemas of input relations for a query are fixed (but

query will run regardless of instance!) The schema for the result of a given query is also

fixed! Determined by definition of query language constructs.

• Positional vs. named-field notation: Positional notation easier for formal definitions,

named-field notation more readable. Both used in SQL

Page 11: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Example Instances

sid sname GPA age22 dustin 3.5 25.031 lubber 3.8 25.558 rusty 4.0 23.0

sid sname GPA age28 yuppy 3.9 24.031 lubber 3.8 25.544 guppy 3.5 25.558 rusty 4.0 23.0

sid cid semester 22 101 Fall 9958 103 Spring 99

R1

S1 S2

• Students, Registers, Courses relations for our examples.

cid cname dept101 Database CIS103 Internet ECI

C1

Page 12: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

• Basic operations: Selection ( ) Selects a subset of rows from relation. Projection ( ) Deletes unwanted columns from relation. Cross-product ( ) Allows us to combine two relations. Set-difference ( ) Tuples in reln. 1, but not in reln. 2. Union ( ) Tuples in reln. 1 and in reln. 2.

• Additional operations: Intersection, join, division, renaming: Not essential, but

(very!) useful.

• Since each operation returns a relation, operations can be composed! (Algebra is “closed”.)

Relational Algebra

Page 13: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Projection

sname GPAyuppy 3.9lubber 3.8guppy 3.5rusty 4.0

)2(, Sgpasnameage24.025.523.0

)2(Sage

• Deletes attributes that are not in projection list.

• Schema of result contains exactly the fields in the projection list, with the same names that they had in the (only) input relation.

• Projection operator has to eliminate duplicates! (Why??) Note: real systems typically

don’t do duplicate elimination unless the user explicitly asks for it. (Why not?)

Page 14: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Vertical Slices

• ProjectionSpecifying Elements

No SpecificationList all information about Students

select *from

STUDENT;

(Student)

ConditionalList IDs, names, and addresses of all students

select StudentID, name, addressfrom STUDENT;

StudentID, name, address (Student)

Algebra: projection

<A1,A2,...Am> (R)

Page 15: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Does SQL treat Relations as ‘Sets’?

What are the different salaries we pay to our employees?

select salaryfrom EMPLOYEE;

OR is the following better?

select DISTINCT salaryfrom EMPLOYEE;

Page 16: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Selection

)2(8.3 Sgpa

sid sname GPA age28 yuppy 3.9 35.058 rusty 4.0 35.0

sname GPAyuppy 3.9rusty 4.0

))2(8.3(, Sgpagpasname

• Selects rows that satisfy selection condition.

• No duplicates in result! (Why?)

• Schema of result identical to schema of (only) input relation.

• Result relation can be the input for another relational algebra operation! (Operator composition.)

Page 17: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Horizontal Slices

• Restriction Specifying Conditions

UnconditionalList all students

select *from

STUDENT;

(Student)

ConditionalList all students with GPA > 3.0

select *from STUDENTwhere GPA > 3.0;

GPA > 3.0 (Student)

Algebra: selection

or restriction (R)

Page 18: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Specifying Conditions

List all students in ...select * from STUDENTwhere city in (‘Boston’,’Atlanta’);

List all students in ...select * from STUDENTwhere zip not between 60115 and 60123;

Page 19: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Pattern Matching

‘%’ any string with n characters, n>=0‘_’ any single character. x exact sequence of string x.

List all CIS 3200 level courses.select * from COURSEwhere course# like ? ;

List all CIS courses.select * from COURSEwhere course# like ‘CIS%’;

Page 20: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Missing or Incomplete Information

•List all students whose address or telephone number is missing:

select *from STUDENTwhere Address is null or GPA is null;

Page 21: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Horizontal and Vertical

Query:List all student ID, names and addresses who have GPA > 3.0 and date of birth before Jan 1, 1980.

select StudentID, Name, Addressfrom STUDENTwhere GPA > 3.0 and DOB < ‘1-Jan-80’order by Name DESC;

Algebra: StudentID,name, address ( GPA > 3.0 and DOB < ‘1-Jan-80’ (STUDENT))Calculus: {t.StudentID, t.name, t.address | t Student t.GPA > 3.0

t.DOB < ‘1-Jan-80’} Order by sorts result in descending (DESC) order.

Note: The default order is ascending (ASC) as in:order by Name;

Page 22: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Union, Intersection, Set-Difference

• All of these operations take two input relations, which must be union-compatible: Same number of fields. `Corresponding’ fields

have the same type.• What is the schema of

result?

sid sname gpa age22 dustin 3.5 25.031 lubber 3.8 25.558 rusty 4.0 23.044 guppy 3.5 25.528 yuppy 3.9 24.0

sid sname gpa age31 lubber 3.8 25.558 rusty 4.0 23.0

S S1 2

S S1 2sid sname gpa age22 dustin 3.5 25.0

S S1 2

Page 23: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Union

List students who live in Atlanta or GPA > 3.0select StudentID, Name, DOB, Addressfrom STUDENTwhere Address = ‘Atlanta’unionselect StudentID, Name, DOB, Addressfrom STUDENTwhere GPA > 3.0;

Can we perform a Union on any two Relations ?

Page 24: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Union Compatibility

Two relations, A and B, are union-compatibleifA and B contain a same number of attributes, andThe corresponding attributes of the two have the same domains Examples

CIS=Student (ID: Did; Name: Dname; Address: Daddr; Grade: Dgrade);

Senior-Student (SName: Dname; S#: Did; Home: Daddr; Grade: Dgrade);

Course (C#: Dnumber; Title: Dstr; Credits: Dnumber)

Are CIS-Student and Senior-Student union compatible?Are CIS-Student and Course union compatible?

What happens if we have duplicate tuples?What will be the column names in the resulting Relation?

Page 25: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Union, Intersect, Minus

select CUSTNAME, ZIPfrom CUSTOMERwhere STATE = ‘MA’ UNIONselect SUPNAME, ZIPfrom SUPPLIERwhere STATE = ‘MA’ ORDER BY 2;

select CUSTNAME, ZIPfrom CUSTOMERwhere STATE = ‘MA’ INTERSECTselect SUPNAME, ZIPfrom SUPPLIERwhere STATE = ‘MA’ ORDER BY 2;

select CUSTNAME, ZIPfrom CUSTOMERwhere STATE = ‘MA’ MINUSselect SUPNAME, ZIPfrom SUPPLIERwhere STATE = ‘MA’ ORDER BY 2;

B

A

B

A

B

AA

Page 26: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Cross-Product• Each row of S1 is paired with each row of R1.• Result schema has one field per field of S1 and R1,

with field names `inherited’ if possible. Conflict: Both S1 and R1 have a field called sid.

( ( , ), )C sid sid S R1 1 5 2 1 1

(sid) sname GPA Age (sid) cid semester22 dustin 3.5 25.0 22 101 Fall 9922 dustin 3.5 25.0 58 103 Spring 9931 lubber 3.8 25.5 22 101 Fall 9931 lubber 3.8 25.5 58 103 Spring 9958 rusty 4.0 23.0 22 101 Fall 9958 rusty 4.0 23.0 58 103 Spring 99

Renaming operator:

Page 27: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Joins

• Condition Join:

• Result schema same as that of cross-product.• Fewer tuples than cross-product, might be able to

compute more efficiently• Sometimes called a theta-join.

R c S c R S ( )

(sid) sname GPA age (sid) cid Semester22 dustin 3.5 25.0 58 103 Spring 9931 lubber 3.8 25.5 58 103 Spring 99

S RS sid R sid1 11 1 . .

Page 28: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Joins

• Equi-Join: A special case of condition join where the condition c contains only equalities.

• Result schema similar to cross-product, but only one copy of fields for which equality is specified.

• Natural Join: Equijoin on all common fields.

sid sname GPA age cid semester22 dustin 3.5 25.0 101 Fall 9958 rusty 4.0 23.0 103 Spring 99

S Rsid1 1

Page 29: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Find names of students who have taken course #103

Solution 2:)Re,1( 103 gistersTemp cid

)1,2( StudentsTempTemp

sname Temp( )2 Solution 3:

))Re(103( Studentsgisterscidsname

))Re(( 103 Studentsgisterscidsname Solution 1:

Page 30: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Connecting/Linking Relations

List information about all students and the classes they are taking

What can we use to connect/link Relations?Join: Connecting relations so that relevant tuples can be retrieved.

ID Name ***s1 Jose ***s2 Alice ***s3 Tome ****** *** *** Emp# ID C# ***

e1 s1 BA 201 ***e3 s2 CIS 300 ***e2 s3 CIS 304 ****** *** ***

Student

Class

Page 31: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Join

CartesianProduct

Student: 30 tuples Class: 4 tuples

Total Number of Tuples in the Cartesian Product. ? (match each tuple of student to every tuple of class)

Select tuples having identical Student Ids.Expected number of such Tuples: Join Selectivity

R1 R2

Page 32: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Join Forms

• General Join FormsEquijoinOperator Dependent

• Natural Join• Outer Join

LeftRightFull

select s.*, c.*from STUDENT s, CLASS cwhere s.StudentID = c.SID (+);

select s.*, c.*from STUDENT s, CLASS cwhere s.StudentID = c. SID;

=x > y

<>...

R1 R2

R1 R2

Page 33: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Find names of students who have taken a CIS course

• Information about departments only available in Courses; so need an extra join:

)Re)''(( StudentsgistersCoursesCISdeptsname

A more efficient solution:))Re)''((( StudentsgisCoursesCISdeptcidsidsname

A query optimizer can find this given the first solution!

Page 34: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Find students who have taken an MIS or a CS course

• Can identify all MIS or CS courses, then find students who have taken one of these courses:

))''''(,1( CoursesCSdeptMISdeptTemp

)Re1( StudentsgisTempsname

Can also define Temp1 using union! (How?) What happens if is replaced by in this query?

Page 35: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Find students who have taken a CIS and an ECI Course

• Previous approach won’t work! Must identify students who have taken CIS courses, students who have taken ECI courses, then find the intersection (note that sid is a key for Students):

))Re)''((,1( gisCoursesCISdeptsidTemp

))21(( StudentsTempTempsname

))Re)''((,2( gisCoursesECIdeptsidTemp

Page 36: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Relational Calculus

• Comes in two flavours: Tuple relational calculus (TRC) and Domain relational calculus (DRC).

• Calculus has variables, constants, comparison ops, logical connectives and quantifiers. TRC: Variables range over (i.e., get bound to) tuples. DRC: Variables range over domain elements (= field

values). Both TRC and DRC are simple subsets of first-order logic.

• Expressions in the calculus are called formulas. An answer tuple is essentially an assignment of constants to variables that make the formula evaluate to true.

Page 37: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Find students with GPA > 3.7 who have taken a CIS Course

7.3.| GPAtStudentstt

sidtsidrgisrr ..Re

''... CISdeptccidrcidcCoursescc

TRC:

7.3,,,|,,, GStudentsAGNIAGNI

IIrgisSCrIrSCrIr Re,,,,

'',,,, CISDCrCCoursesDCNCDCNC

DRC:

Page 38: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Find students who have taken all CIS courses

StudentsAGNIAGNI ,,,|,,,

''^,,,, CISDCoursesDCNCDCNC

CCrIrIgisSCrIrSCrIr Re,,,,

DRC:

Studentstt|

''.^ CISdeptcCoursescc

cidccidrsidtsidrgisrr ....Re

TRC:

How will you do this with Relational Algebra?

Page 39: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Monotonic and Non-Monotonic Queries

• Monotonic queries: queries for which the size of the results either increase or stay the same as the size of the inputs increase. The result size never decreases

• Non-monotonic queries: queries for which it is possible that the size of the result will DECREASE when the size of the input increases

• Examples of each?

• Which of the algebra operations is non-monotonic?• What does this signify?

Page 40: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Summaries and Aggregates

Calculate the average GPA select avg. (GPA)from STUDENT,

Find the lowest GPA select min (GPA) as minGPAfrom STUDENT,

How many CIS majors? select count (StudentId)from STUDENTwhere major=‘CIS’;

Discarding duplicates select avg (distinct GPA)STUDENTwhere major=‘CIS’(is this above query correct?)

Page 41: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Aggregate Functions

COUNT (attr) - a simple count of values in attrSUM (attr) - sum of values in attrAVG (attr) - average of values in attrMAX (attr) - maximum value in attrMIN (attr) - minimum value in attr

Take effect after all the data is retrieved from the databaseApplied to either the entire resulting relation or groupsCan’t be involved in any query qualifications (where clause)

Would the following query be permitted?select StudentIdfrom STUDENTwhere GPA = max (GPA);

Page 42: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Grouping Results Obtained

Show all students enrolled in each course.select cno, StudentIDfrom REGISTRATIONgroup by cno; Is this grouping OK?

Calculate the average GPA of students by county.select county, avg (GPA) as CountyGPAfrom STUDENTgroup by county;

Calculate the enrollment of each class.select cno, year , term, count (StudentID) as enrollfrom REGISTRATIONgroup by cno, year, term;

Page 43: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Selections on Groups

Show all CIS courses that are full. select cno, count (StudentID) from REGISTRATION group by cno having count (StudentID) > 29;

Page 44: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Grouping Results after Join

Calculate the average GPA of each class

select course#, avg (GPA)from STUDENT S, CLASS Cwhere S.StudentID = C.SIDgroup by course#,

Page 45: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Nesting Queries

SELECT attribute(s)FROM relation(S)WHERE attr [not] {in | comparison operator | exists }( query statement(s) );

List names of students who are taking “BA201”select Namefrom Studentwhere StudentID in( select StudentID from REGISTRATIONwhere course#=‘BA201’);

Page 46: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Sub Queries

List all students enrolled in CIS coursesselect namefrom STUDENTwhere StudentId in

(select StudentIdfrom REGISTRATIONwhere cno like ‘CIS%’);

List all courses taken by Student (Id 1011) select cnamefrom COURSEwhere cnum = any

(select cnofrom REGISTRATIONwhere StudentId = 1011);

Page 47: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Sub Queries

Who received the highest grade in CIS 8140select StudentIdfrom REGISTRATIONwhere cnum = ‘CIS 8140’ and

grade >=all(select gradefrom REGISTRATIONwhere cno = ‘CIS 8140’);

List all students enrolled in CIS courses.select namefrom STUDENT Swhere exists

(select *from REGISTRATIONwhere StudentId = S.StudentId

and cno like ‘CIS%’);

Page 48: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Relational Views

• Relations derived from other relations.• Views have no stored tuples.• Are useful to provide multiple user views.

View 1 View 2 View N

BaseRelation 1

BaseRelation 2

•What level in the three layer model do views belong?•Which kind of independence do they support?

Page 49: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

View Creation

Create View view-name [ ( attr [ , attr ] ...) ]AS subquery[ with check option ] ;

DROP VIEW view-name;

Create a view containing the student ID, Name, Age and GPA for those who are qualified to take 300 level courses, i.e., GPA >=2.0.

Page 50: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

View Options

• With Check Option enforces the query condition for insertion or update

To enforce the GPA >=2.0 condition on all new student tuples inserted into the view

• A view may be derived from multiple base relations

Create a view that includes student IDs, student names and their instructors’ names for all CIS 300 students.

Page 51: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

View Retrieval

Queries on views are the same as that on base relations.

Queries on views are expanded into queries on their base relations.

select Name, Instructor-Namefrom CIS300-Studentwhere Name = Instructor-Name;

Page 52: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

View: Update

Update on a view actually changes its base relation(s)!

update Qualified-Studentset GPA = GPA-0.1where StudentID = ‘s3’;insert into Qualified-Studentvalues ( ‘s9’, ‘Lisa’, 4.0 )insert into Qualified-Studentvalues ( ‘s10’, ‘Peter’, 1.7 )

Why are some views not updateable?What type of views are updateable?

Page 53: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Non-monotonic queries – again!

• Need to use either MINUS or NOT EXISTS!

• Find courses where no student has gpa over 3.5

• Find students who have taken all courses that Joe has taken

• How would you solve these?

Page 54: ISOM MIS3150 Data and Information Management Query Languages –SQL Arijit Sengupta.

ISOM

Summary

• SQL is a low-complexity, declarative query language

• The good thing about being declarative is that internally the query can be changed automatically for optimization

• Good thing about being low-complexity? No SQL query ever goes into an infinite loopNo SQL query will ever take indefinite amount of

space to get the solution• Can be used for highly complex problems!


Related Documents