Top Banner
SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 3) Peter M. Radcliffe Senior Analyst College of Liberal Arts March 8 th , 2007
30

SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 3)

Jan 13, 2016

Download

Documents

kapila

SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 3). Peter M. Radcliffe Senior Analyst College of Liberal Arts March 8 th , 2007. Agenda. Advanced Topics in SQL Queries Having clause Case/Decode statement Subqueries Wrap-up Q&A. SQL Skeleton. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 3)

SQL 101:An Introduction to writing queries

with SQL at the University of Minnesota (Day 3)

Peter M. RadcliffeSenior Analyst

College of Liberal ArtsMarch 8th, 2007

Page 2: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 3)

Agenda

• Advanced Topics in SQL Queries 1.Having clause

2.Case/Decode statement

3.Subqueries

• Wrap-up Q&A

Page 3: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 3)

SQL Skeleton

• Recall our skeleton SQL Query from previous classes:

SELECT [*, columns, or aggregate functions]

FROM [table name(s)]ON statement [criteria for joining tables]

WHERE [conditions, including comparison functions =,<,<>,etc.]AND/OR/IN statements [additional conditions]

IS (NOT) NULL statements [working with null values]

GROUP BY [columns]

HAVING [conditions, including comparison functions]

ORDER BY [columns]

Page 4: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 3)

HAVING Clause

• HAVING is the equivalent of the WHERE clause for aggregates

• While the WHERE clause operates on the individual rows of data, HAVING operates on the aggregates created in the SELECT clause

• HAVING can use the same relational arguments that can be used in the WHERE clause

• HAVING can use multiple criteria joined by AND & OR statements, just as in the WHERE clause

Page 5: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 3)

HAVING Examples

• To check for groups with multiple records: HAVING count(*) > 1

• To find students who have enrolled for up to 4 terms HAVING count(term) <= 4

• To find courses with high average grades HAVING avg(grd_pts_per_unit) > 3.5

• To find students with a low probability of retention HAVING avg(retained) < .75 Retained in this example is a calculated 0/1 column

Page 6: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 3)

Example Query with HAVING

• Departments with large numbers of PO transactionsSELECT a.area_class, a.area, c.area_name30,

count(*) AS records, sum(a.dollar_amt) AS total

FROM dwfs_tran_hist a

INNER JOIN dwfs_coa_area c

ON a.area = c.area

WHERE a.area_class = 'TCLA‘

AND a.fisc_yr = '2007’

AND a.trans_cd = 'PO’

AND a.acct_type = '21'

GROUP BY a.area_class, a.area, c.area_name30

HAVING count(*) > 100

ORDER BY a.area_class, a.area, c.area_name30

Page 7: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 3)

HAVING Example Results

Area Class Area Area Name PO’s Amount

TCLA 447 CLA EXTERNAL RELATIONS 115 $349,407.02

TCLA 449 MINNESOTA POPULATION CTR 149 $244,071.48

TCLA 465 ECONOMICS 146 $36,801.61

TCLA 466 ENGLISH LANGUAGE & LIT 106 $225,045.76

TCLA 476 CENTER FOR WRITING 155 $85,758.61

TCLA 481 MUSIC, SCHOOL OF 171 $369,233.88

TCLA 485 PSYCHOLOGY 208 $385,595.67

TCLA 867 CLA TECHNOLOGY 297 $949,458.63

Page 8: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 3)

CASE Statement

• CASE statements replace existing values from a column with new values supplied in the CASE syntax

• Since the Data Warehouse is a read-only environment, the values in the tables themselves cannot be changed, rather the results produced by the query are changed

• Typically, the modified values are placed in a new column, but it is possible to assign the new column the same name as the old one, effectively substituting the new column for the old one

Page 9: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 3)

CASE Statement

• CASE statements can be used in several different ways: To “recode” values in a column to a desired new value To assign values to missing data (similar to NVL command) or

set valid values to missing (e.g. change 0 to NULL) To assign labels to values (e.g. change the “010” acad_level

code to “Freshman”) To collapse values into a smaller set for grouping (e.g. set both

“010” and “015” acad_level codes to “Freshman”) In conjunction with aggregation commands to create conditional

sums, averages, and counts

Page 10: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 3)

CASE Syntax

Each CASE statement has the following components:1. CASE (to invoke the command)

2. The name of the source column

3. WHEN (to identify the value or values to change)

4. The actual value or range of values to change

5. THEN (to identify the new values)

6. The actual new values to replace the old values

7. ELSE (optional, to handle values not listed in WHEN)

8. The actual value to assign when ELSE condition used

9. END (to terminate the command)

10.AS [new_column_name] to label the new column

Page 11: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 3)

CASE Syntax

• There are two versions of the syntax for CASE statements, depending on the number of values being recoded

• When a single value is being changed, the syntax is:CASE column WHEN old_value THEN new_value ELSE

else_value END AS new_name

• When a set of values or a more complex condition is being used, the syntax is:CASE WHEN column {operator} old_value THEN new_value ELSE

else_value END AS new_name

Page 12: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 3)

CASE Syntax Examples

• Single Value:

CASE acad_level_bot WHEN ‘010’ THEN ‘Freshman’ ELSE ‘Other’ END AS freshman_id

• Multiple Values

CASE WHEN acad_level_bot IN (‘010’,’015’) THEN ‘Freshman’ ELSE ‘Other’ END AS freshman_id

Page 13: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 3)

CASE: Multiple Recodes

• Frequently, there are multiple values in the original column to be set to new values in the new column

• To accomplish this, repeat the WHEN and THEN commands, but keep single CASE, ELSE, and END commands

• Order of WHEN and column name still depends on the complexity of the conditions used to identify the target values in the original column

Page 14: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 3)

CASE: Multiple Recode Example 1

CASE acad_level_bot

WHEN ‘010’ THEN ‘Freshman’

WHEN ‘015’ THEN ‘Freshman’

WHEN ‘020’ THEN ‘Sophomore’

WHEN ‘025’ THEN ‘Sophomore’

WHEN ‘030’ THEN ‘Junior’

WHEN ‘035’ THEN ‘Junior’

WHEN ‘040’ THEN ‘Senior’

WHEN ‘045’ THEN ‘Senior’

ELSE ‘Error’

END AS acad_level

Page 15: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 3)

CASE: Multiple Recode Example 2

CASE WHEN a.total_sat_score >= 1600 then 36 WHEN a.total_sat_score >= 1580 and a.total_sat_score < 1600 THEN 35 WHEN a.total_sat_score >= 1520 and a.total_sat_score < 1580 THEN 34 WHEN a.total_sat_score >= 1470 and a.total_sat_score < 1520 THEN 33 WHEN a.total_sat_score >= 1420 and a.total_sat_score < 1470 THEN 32 WHEN a.total_sat_score >= 1380 and a.total_sat_score < 1420 THEN 31 WHEN a.total_sat_score >= 1340 and a.total_sat_score < 1380 THEN 30 WHEN a.total_sat_score >= 1300 and a.total_sat_score < 1340 THEN 29<…lines removed…> WHEN a.total_sat_score >= 500 and a.total_sat_score < 560 then 11 WHEN a.total_sat_score < 500 then 0 ELSE 99 END AS sat_act_equiv

Page 16: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 3)

CASE: Multiple Recode Example 3

CASE WHEN home_loc_cd in ('003','019','037','053','123','139','163') THEN 'Metro' WHEN (home_loc_cd between '000' and '200' and home_loc_cd not in

('003','019','037','053','123','139','163')) THEN 'GreaterMN' WHEN home_loc_cd = 'MN' THEN 'GreaterMN' WHEN home_loc_cd = 'WI' THEN 'Wisconsin' WHEN home_loc_cd in ('ND','SD') THEN 'Dakotas' WHEN home_loc_cd in

('AK','AL','AR','AS','AZ','CA','CO','CT','DC','DE','FL','GA','GU', 'HI','IA','ID','IL','IN','KY','KS','LA','MA','MD','ME','MI','MO','MS', 'MT','NC','NE','NH','NJ','NM','NV','NY','OH','OK','OR','PA','PR','RI', 'SC','TN','TT','TX','UT','VA','VI','VT','WA','WV','WY','USA') THEN 'USA'

WHEN home_loc_cd IS NULL THEN 'Missing' ELSE 'Foreign' END AS region

Page 17: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 3)

CASE and Aggregates

• One of the limitations of the aggregate functions in SELECT and GROUP BY clauses we discussed earlier is that it is not possible to produce summaries at different levels of aggregation

• CASE statements, when combined with aggregate functions, allow for the creation of “conditional” aggregate functions, where records are only summarized if they meet certain conditions, therefore allowing summaries to be created at multiple levels of aggregation simultaneously

Page 18: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 3)

CASE and Aggregates

• To integrate CASE statements into aggregate functions, place the entire CASE statement, minus the “AS” column naming, inside the aggregate function, as below:

• (Conditional Sum)

SUM(CASE term WHEN ‘1069’ THEN tuition_net_instr ELSE 0 END) AS Fall_2006_Tuition

• (Conditional Count)

SUM(CASE acad_career WHEN ‘UGRD’ THEN 1 ELSE 0 END) AS ug_total

Page 19: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 3)

CASE and Aggregates Example

SELECT acad_career,

SUM(CASE acad_group_distr WHEN ‘TCLA’ THEN tuition_net_acad_prog ELSE 0 END) as

reg_tuition,

SUM(CASE area_class WHEN ‘TCLA’ THEN tuition_net_instr ELSE 0 END) as instr_tuition

FROM ps_dwta_stdnt_crse

WHERE institution = ‘UMNTC’

AND term = ‘1069’

GROUP BY acad_career

ORDER BY acad_career DESC

Page 20: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 3)

CASE and Aggregates Results

• Query produced conditional summaries for tuition generated by CLA students and CLA courses

• Where acad_group_distr was not ‘TCLA’ the tuition generated was set to zero, so non-CLA students did not contribute to the total

• Where area_class was not ‘TCLA’, the tuition generated was also set to zero, so non-CLA courses did not contribute to the total

Academic Career Registration Tuition Instructional Tuition

Undergraduate $12,940,852 $36,735,807

Graduate $1,878,437 $6,197,437

Page 21: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 3)

SUBQUERIES

• SUBQUERIES, like UNIONS, are ways of creating compound queries that produce a single set of results.

• There are multiple ways to accomplish almost everything in SQL.

• In theory, any result that can be produced through a SUBQUERY can also be produce through a properly formed set of table JOINS.

• For some questions, however, SUBQUERIES are the easiest method.

Page 22: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 3)

SUBQUERIES

• As their name suggests, SUBQUERIES are secondary, dependent queries attached to the main query.

• SUBQUERIES can have all of the same components as the primary query, and can return either a single record or a set of records.

• While it is not absolutely necessary, SUBQUERIES often have JOIN conditions to match records in the primary query, and are often based on SELF JOINS, pulling data from the same source as the primary query.

Page 23: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 3)

SUBQUERIES & Current Records

• When working with the University of Minnesota Data Warehouse, one of the most powerful uses of SUBQUERIES is to find the most current record, using effective dates and effective sequences.

• Many tables in the Data Warehouse have a “current” flag of some form that allows the most current record to be identified with a simple WHERE clause statement (e.g. WHERE curr_flg = ‘Y’).

• The current flag, however, does not work if you want to reconstruct the state at some earlier point in time.

Page 24: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 3)

SUBQUERIES: Locations

• SUBQUERIES can be placed anywhere in a query, including SELECT, WHERE, or HAVING clauses, or embedded inside another SUBQUERY.

• The most common location, however, is in the WHERE clause, and that will be the focus of the examples that follow.

• In the WHERE clause, a SUBQUERY allows information from a different table (or another part of the same table) to be used as a criterion for selecting rows.

Page 25: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 3)

SUBQUERIES: Basic Structure

SELECT columns

FROM tables

WHERE conditions

(SELECT columns

FROM tables

WHERE conditions)

GROUP BY grouping_columns

HAVING conditions

ORDER BY ordering_columns

Page 26: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 3)

SUBQUERIES: Example 1SELECT DISTINCT a.emplid, a.term, a.subject, a.catalog_nbr FROM ps_dwsa_stdnt_enrl_hs a WHERE a.institution = 'UMNTC'

AND a.acad_career = 'UGRD' AND a.emplid in (SELECT DISTINCT b.emplid

FROM ps_dwsa_stdnt_enrl_hs b WHERE b.subject = 'CHIC‘ AND b.catalog_nbr = '1112')

AND a.term > (SELECT MAX(c.term) FROM ps_dwsa_stdnt_enrl_hs c WHERE a.emplid = c.emplid AND c.subject = 'CHIC' AND c.catalog_nbr = '1112')

Page 27: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 3)

SUBQUERIES: Example Results

Emplid Term Subject Catalog #

1234567 1069 APEC 1001

1234567 1069 RHET 3266

1234567 1073 ECON 3701

7654321 1069 CHEM 1021

4567123 1069 ARTH 3009

4567123 1069 CHIC 3375

6745231 1069 CHIC 3275

6745231 1069 FOST 3203

6745231 1069 FOST 3320

6745231 1069 SPAN 3512

Page 28: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 3)

SUBQUERIES: Example 2SELECT a.emplid, a.area, a.org,

sum(a.ytd_exp_amt+a.enc_amt) as amount FROM ps_dwpy_monthly_sum_07 a WHERE a.area = '465'

AND a.org = '1506' AND a.obj in ('7000','7005','7010','7015') AND a.acctg_pd = (SELECT MAX(b.acctg_pd)

FROM ps_dwpy_monthly_sum_07 b WHERE a.emplid = b.emplid

AND a.area = b.area AND a.org = b.org AND a.obj = b.obj)

GROUP BY a.emplid, a.area, a.org, a.obj HAVING sum(a.ytd_exp_amt+a.enc_amt) > 0 ORDER BY a.emplid, a.area, a.org, a.obj

Page 29: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 3)

SUBQUERIES: Example 2 Results

Emplid Area Org Amount Paid

1234567 465 1506 $17,500.00

2345671 465 1506 $680.00

3456712 465 1506 $17,500.00

4567123 465 1506 $16,130.52

5671234 465 1506 $2,167.56

6712345 465 1506 $80,000.12

7123456 465 1506 $17,500.08

7654321 465 1506 $35,000.00

6543217 465 1506 $18,057.00

5432176 465 1506 $680.00

Page 30: SQL 101: An Introduction to writing queries with SQL at the University of Minnesota (Day 3)

Wrap-up Q&A

Questions?

Thank you!