Top Banner
Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Chapter 6 Set Functions
28

Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Chapter 6 Set Functions.

Dec 17, 2015

Download

Documents

Lucy Pearson
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Chapter 6 Set Functions.

Using Relational Databases and SQL

Steven EmoryDepartment of Computer Science

California State University, Los Angeles

Chapter 6Set Functions

Page 2: Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Chapter 6 Set Functions.

Topics for Today

Set (Aggregate) Functions

GROUP BY Clause

HAVING Clause

Page 3: Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Chapter 6 Set Functions.

Set Functions

Definition

A set function, or group aggregate function, is a function that operates on groups

Page 4: Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Chapter 6 Set Functions.

Aggregate Functions

Aggregate/Non-aggregate similarities

Both take some kind of input

Both perform operations using the input

Both have an single output.

Aggregate/Non-aggregate differences

Input to an aggregate function is a group of data

Input to a non-aggregate function is a single item

Aggregate functions may not be nested

Aggregate functions do not alter any table data

Page 5: Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Chapter 6 Set Functions.

Examples

Function Example:

SELECT LEFT(Title, 1)FROM Movies;

Set Function Example:

SELECT MPAA, COUNT(MPAA)FROM MoviesGROUP BY MPAA;

Page 6: Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Chapter 6 Set Functions.

Aggregate Functions

There are only 5 general aggregate functions

COUNT(*), COUNT(fieldname)

AVG(fieldname)

MIN(fieldname)

MAX(fieldname)

SUM(fieldname)

Page 7: Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Chapter 6 Set Functions.

COUNT

COUNT(*)

Counts the number of rows in a table

Excludes NULLs (doesn't count them)

-- This query returns 6.SELECT COUNT(*) AS 'Number of Movies'FROM Movies;

COUNT(fieldname)

Same as above

-- This query also returns 6.SELECT COUNT(ArtistID) AS 'Number of Movies'FROM Movies;

Page 8: Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Chapter 6 Set Functions.

AVG

AVG(fieldname)

Averages all the data under fieldname

Excludes NULLs (doesn't count NULL as 0).

-- Averages all movie runtimes.SELECT AVG(Runtime) AS 'Average Runtime'FROM Movies;

Page 9: Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Chapter 6 Set Functions.

MIN and MAX

MIN(fieldname)

Returns the minimum value under fieldname

-- Returns the minimum movie runtime.SELECT MIN(Runtime) AS 'Shortest Runtime'FROM Movies;

MAX(fieldname)

Returns the maximum value under fieldname

-- Returns the maximum movie runtime.SELECT MAX(Runtime) AS 'Longest Runtime'FROM Movies;

Page 10: Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Chapter 6 Set Functions.

SUM

SUM(fieldname)

Sums all the data under fieldname

Excludes NULLs (doesn't count NULL as 0).

-- Sums all of the movie runtimes.SELECT SUM(Runtime) AS 'Total Runtime'FROM Movies;

Page 11: Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Chapter 6 Set Functions.

Filtering Aggregate Calculations

To exclude items from being aggregated, you may use the WHERE clause.

Example: Count the number of PG-13 movies.SELECT COUNT(*)FROM MoviesWHERE MPAA = 'PG-13';

Example: Count the number of rated R movies.SELECT COUNT(*)FROM MoviesWHERE MPAA = 'R';

Page 12: Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Chapter 6 Set Functions.

Mixing Field Types

Can we calculate both with a single query?+-------+----------+| MPAA | COUNT(*) |+-------+----------+| PG-13 | 5 || R | 1 |+-------+----------+2 rows in set (0.01 sec)

Well, we would need to mix non-aggregated fieldnames with aggregated ones

-- Example: What does this do? Does it work? No!SELECT MPAA, COUNT(MPAA)FROM Movies;

Page 13: Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Chapter 6 Set Functions.

Grouping Tables

Solution: You can divide the table into groups.

-- Groups the movies table by MPAA rating.SELECT MPAAFROM MoviesGROUP BY MPAA;

-- Groups and counts movies by MPAA rating.SELECT MPAA, COUNT(MPAA)FROM MoviesGROUP BY MPAA;

Page 14: Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Chapter 6 Set Functions.

How GROUP BY Works

GROUP BY begins by sorting the table based on the grouping attribute (in our case, Gender)

If any aggregates are present, GROUP BY causes each aggregate to be applied per-group rather than per-table

GROUP BY then condenses the table so that each group only appears once in the table (if listed) and displays any aggregated group values along with it

Page 15: Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Chapter 6 Set Functions.

GROUP BY Example

Page 16: Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Chapter 6 Set Functions.

Grouping on Multiple Fields

GROUP BY can use multiple fieldnames (similar to how you can sort using multiple fieldnames)

-- Example: Report the number of movies by MPAA rating and year of release.SELECT MPAA, YEAR(ReleaseDate), COUNT(*)FROM MoviesGROUP BY MPAA, YEAR(ReleaseDate);

In the SELECT clause that contains one or more aggregates, you should only list table attributes that are als

Page 17: Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Chapter 6 Set Functions.

Filtering Based on Aggregates

Can we use aggregate functions in the WHERE clause?

-- List all genres that have an average movie runtime of over 2 hours.SELECT Genre, COUNT(*), AVG(Runtime)FROM Movies JOIN XRefGenresMovies USING(MovieID)WHERE AVG(Runtime) > 120GROUP BY Genre;

The answer is no because WHERE filters during aggregation! We need something that filters after!

Page 18: Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Chapter 6 Set Functions.

The HAVING Clause

Solution is to use the HAVING clause

Example:

-- List all genres that have an average movie runtime of over 2 hours. SELECT Genre, COUNT(*), AVG(Runtime)FROM Movies JOIN XRefGenresMovies USING(MovieID)GROUP BY GenreHAVING AVG(Runtime) > 120;

Page 19: Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Chapter 6 Set Functions.

How HAVING Works

In previous example:

This is calculated first...SELECT Genre, COUNT(*), AVG(Runtime)FROM Movies JOIN XRefGenresMovies USING(MovieID)GROUP BY Genre;

Then the result is filtered using the HAVING clause...SELECT Genre, COUNT(*), AVG(Runtime)FROM Movies JOIN XRefGenresMovies USING(MovieID)GROUP BY GenreHAVING AVG(Runtime) > 120;

Page 20: Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Chapter 6 Set Functions.

How HAVING Works

So in other words:

WHERE filters per row (DURING aggregation)

HAVING filters per group (AFTER aggregation)

Since HAVING filters on groups:

You cannot use just any fieldname you want to in the SELECT or HAVING clause with an aggregate query; you can only the use ones you choose to group by

Example on next page...

Page 21: Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Chapter 6 Set Functions.

Having ExamplesThis works:

SELECT Genre, COUNT(*), AVG(Runtime)FROM Movies JOIN XRefGenresMovies USING(MovieID) GROUP BY GenreHAVING AVG(Runtime) > 120;

This doesn't work:

SELECT Genre, COUNT(*), AVG(Runtime)FROM Movies JOIN XRefGenresMovies USING(MovieID) GROUP BY GenreHAVING AVG(Runtime) > Runtime;

HAVING only sees group attributes and aggregates.

Page 22: Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Chapter 6 Set Functions.

Having Examples

Why doesn't it work?

Because Runtime is an attribute of a movie and not an attribute of a group. You can only use group attributes and aggregate functions in a HAVING clause.

Since Genre is an attribute of the aggregated group (Genre is listed in the GROUP BY clause), we can use it in the HAVING clause.

SELECT Genre, COUNT(*), AVG(Runtime)FROM Movies JOIN XRefGenresMovies USING(MovieID) GROUP BY GenreHAVING (AVG(Runtime) > 120 AND Genre <> ‘Horror’);

Page 23: Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Chapter 6 Set Functions.

HAVING Summary

So in a HAVING clause:

You can use aggregate functions

You can use constant values

You can use grouping attributes

Anything else and...

Happy error time!

Usually “ERROR 1111 (HY000): Invalid use of group function” or “ERROR 1054 (42S22): Unknown column 'column name' in having clause” are the most common errors.

Page 24: Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Chapter 6 Set Functions.

An Advanced HAVING Problem

List the country and average age of all (movie-related) people born in that country, for only those countries that have an average person age greater than 50. Remember that nobody every says “I'm 52.3948279 years old!” Always truncate ages to zero decimal places.

Page 25: Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Chapter 6 Set Functions.

Solution

SELECT BirthCountry, TRUNCATE(AVG(TRUNCATE(DATEDIFF(CurDate(), BirthDate)/365, 0)), 0) AS 'Average Age'FROM PeopleGROUP BY BirthCountryHAVING TRUNCATE(AVG(TRUNCATE(DATEDIFF(CurDate(), BirthDate)/365, 0)), 0) > 50 AND BirthCountry IS NOT NULL;

Page 26: Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Chapter 6 Set Functions.

Solution

Note that you may also define an alias for the aggregate function in MySQL and use it in the HAVING clause

SELECT BirthCountry, TRUNCATE(AVG(TRUNCATE(DATEDIFF(CurDate(), BirthDate)/365, 0)), 0) AS AverageAgeFROM PeopleGROUP BY BirthCountryHAVING AverageAge > 50 ANDBirthCountry IS NOT NULL;

Page 27: Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Chapter 6 Set Functions.

Aggregating Distinct Values

A normal SELECT DISTINCT query filters out duplicates in a second pass

Aggregates are computed in the first pass, so if a field contains duplicate values, and you aggregate on that field, SELECT DISTINCT WILL NOT filter out duplicate values from being aggregated.

The solution is to use the DISTINCT keyword in the aggregate function:

SELECT COUNT(DISTINCT MPAA)FROM Movies;

Page 28: Using Relational Databases and SQL Steven Emory Department of Computer Science California State University, Los Angeles Chapter 6 Set Functions.

Aggregating Distinct Values

Example:

-- Returns 6 since there are 6 movies.SELECT COUNT(MPAA)FROM Movies;

-- Returns 6 since there are 6 movies and 6 is unique.SELECT DISTINCT COUNT(MPAA)FROM Movies;

-- Returns 2 since only PG-13 and R rated movies are currently in the database.SELECT COUNT(DISTINCT MPAA)FROM Movies;