SQL Unit 9 Correlated Subqueries

1

SQL Unit 9Correlated Subqueries

Kirk Scott

2

3

4

• 9.1 Correlated Subqueries• 9.2 EXISTS• 9.3 NOT EXISTS• 9.4 Double NOT EXISTS

5

9.1 Correlated Subqueries

6

• 1. Recall how subqueries with IN work. • There is an inner query and an outer query. • The inner query is run first, producing a set of

results. • The outer query is run second, and the results

it produces depend a row-by-row, yes or no test, of whether a field value is in the results of the inner query.

7

• A row would be included in the outer query results only if the value of one of its fields matched a value in the results of the inner query.

• The connection between the two queries depends on the matching of the field in the WHERE clause of the outer query with the field in the SELECT clause of the inner query.

8

• Here is an example that finds the names of salespeople who sold cars:

• • SELECT name• FROM Salesperson• WHERE spno IN• (SELECT spno• FROM Carsale)

9

• 2. The following example illustrates the syntax of a correlated subquery.

• There is a join in the inner query that involves a field from the table of the inner query and a field from the table of the outer query.

• This is what is meant by correlation:

10

• SELECT name• FROM Salesperson• WHERE spno IN• (SELECT spno• FROM Carsale• WHERE Salesperson.spno = Carsale.spno)

11

• You would never actually write a query like this intentionally, because the correlation in this case is redundant.

• This query gives exactly the same results as the previous example.

• However, the query is syntactically correct and it illustrates some of the important aspects of correlation.

12

• Consider the join. • Recall that for inline views, the outer query

may make use of fields defined in the inner query.

• On the other hand, for queries with IN, the outer query can’t make use of fields defined in the inner query.

13

• In a correlated subquery, the fields of the outer table are available to the inner query.

• In this example, the inner query can make use of the field Salesperson.spno in the join.

• The only thing to keep in mind is that if you are joining on fields that have the same name, then those names have to be qualified with the names of their tables in order to tell them apart.

14

• There is another thing to notice about correlated subqueries.

• The term correlated means that the inner and outer queries are not independent of each other.

• Unlike other nested queries, the inner query does not run to completion first, followed by the outer query.

• The inner query cannot run to completion independently of the outer query because the inner query contains a join which involves the outer table.

15

• 3. The next example illustrates a correlated subquery which does not have the redundancy of the previous example.

• This is a type of subquery with an operator which also includes correlation.

• In words, the query will find the names of salespeople who sold at least two cars.

• This query works by comparing the results of a correlated subquery using COUNT with the constant value 2:

16

• SELECT name• FROM Salesperson• WHERE• (SELECT COUNT(*)• FROM Carsale• WHERE Salesperson.spno = Carsale.spno) >=

2

17

• In order to understand what the previous query accomplishes, it might be useful to look at a query that does approximately the same thing using syntax learned earlier.

• The following query finds the names of salespeople who sold more than two cars, and also finds the counts of the numbers of cars they sold by doing a join and using GROUP BY with HAVING:

18

• SELECT name, COUNT(*)• FROM Salesperson, Carsale• WHERE Salesperson.spno = Carsale.spno• GROUP BY spno HAVING COUNT(*) >= 2

19

• Here is an alternative that does approximately the same thing using an inline view.

• It only makes use of the Carsale table, so it gives spno instead of name:

• • SELECT spno, salescount• FROM• (SELECT spno, COUNT(*) AS salescount• FROM Carsale• GROUP BY spno)• WHERE salescount >= 2

20

• The inline view plays the role of the table in the FROM clause of the outer query.

• The fields of this inline view are available to the outer query.

• Since the view is unnamed, a new field, like the COUNT, has to be given an alias if it is to be referred to elsewhere in the query.

21

• The examples get a little messy.• It’s clear that there may be more than one way

to write a query that accomplishes something more complicated than “SELECT * FROM Table”.

• Keep in mind that the point of this section is that one of the alternative ways involves a correlated subquery.

• The other ways which we’ve seen before are now just sideshows.

22

9.2 EXISTS

23

• 1. The most common use of correlated subqueries is with the keyword EXISTS.

• This is the general form of an EXISTS query:• • SELECT blah• FROM blah• WHERE EXISTS• (A correlated subquery goes here. • A join is needed to connect the inner and outer

queries.)

24

• It may be helpful to compare an EXISTS query with an IN query.

• The keyword EXISTS immediately follows WHERE in the query.

• No field in the outer query table is specified. • The link between the inner and outer queries

arises from the join condition in the inner query alone.

25

• 2. Suppose you would like to find the names of salespeople where there exist sales records for those salespeople in the Carsale table.

• This would be the solution using the keyword EXISTS:• • SELECT name• FROM Salesperson• WHERE EXISTS• (SELECT *• FROM Carsale• WHERE Salesperson.spno = Carsale.spno)

26

• The join in the inner query is the only connection between the inner and the outer query.

• It is essential. • The WHERE clause of the outer query doesn't

specify a field from the outer table. • It goes directly to EXISTS.

27

• The inner query does a SELECT *. • There is no need to match fields from the

inner and outer table in the SELECT, so you can select anything you want to. SELECT * is used because you only have to type the one symbol, *.

• It suggests whole records from the Carsale table, but keep in mind that none of the fields are actually needed in the query.

28

• 3. Not surprisingly, what you can do with an EXISTS query you can also do with an IN query or a join query.

• Here is the IN query that corresponds to the foregoing EXISTS query:

• • SELECT name• FROM Salesperson• WHERE spno IN• (SELECT spno• FROM Carsale)

29

• Here is the corresponding join query:• • SELECT DISTINCT name• FROM Salesperson, Carsale• WHERE Salesperson.spno = Carsale.spno

30

• None of the alternatives is either better or worse than the others.

• It's personal choice whether to use EXISTS, IN, or a join.

• Remember that with the subqueries it's not possible to select fields from the inner table to appear as results in the outer query.

• If you want to do that, then you have to use a join.

31

• 4. Here is another straightforward example of an EXISTS query.

• Like the earlier example, it could also be done with an IN or a join query.

• Those alternatives are not shown. • This example is given without further

explanation. • You should be able to figure out what it does:

32

• SELECT vin, make, model, year• FROM Car• WHERE EXISTS• (SELECT *• FROM Carsale• WHERE Car.vin = Carsale.vin)

33

• 5. The following query does not contain a syntactical error, but it is wrong.

• It is an EXISTS query without correlation:• • SELECT name• FROM Salesperson• WHERE EXISTS• (SELECT *• FROM Carsale)

34

• Recall that IN was a set operator. • This example highlights the fact that EXISTS is

a kind of logical operator. • Either something exists or it doesn't, either

"EXISTS" is true or it's false. • Without the correlation, either Carsale

contains records or it doesn't.

35

• If it does, then the inner query is true, and the outer query will simply select the names of all salespeople.

• If, by chance, the Carsale table contained no records, then the inner query would return false, and the outer query would return nothing, even though there may be salespeople in the Salesperson table.

36

• Remember that a join query without a joining condition gives drastically wrong results

• —namely a Cartesian product. • An EXISTS query without a joining condition,

that is, without correlation, does not give such drastically wrong results.

• But if written without a join it is almost certainly not the query that was intended.

37

• 6. Just as it was possible to join a table with itself, it's also possible to write a correlated subquery with EXISTS where the table of the inner query is the same as the table of the outer query.

• This example will be based on the Salesperson table.

• Here are its contents again, as a reminder:

38

Salespersonspno name addr city state phone bossno commrate

111 Fred Flintstone Anchorage AK 723-6666 333 0.03222 Wile E. Coyote 456 Karluk Anchorage AK 724-7777 333333 Bugs Bunny 789 Otis Anchorage AK 725-8888 0.05

444 Rocky the Squirrel 345 Tudor Anchorage AK 727-3333 333 0.05

555 Yosemite Sam 678 Muldoon Anchorage AK 525-2222 333 0.03

39

• Suppose you'd like to find all of the salespeople who are bosses.

• You can translate this verbally into an EXISTS query as follows:

• Find all salespeople where there exists some other salesperson whose bossno matches the first salesperson's spno.

40

• If you're going to open the same table more than once, then you'll need to give it an alias each time.

• In this case the aliases will correspond to opening the table as a salesperson who is a boss and opening the table as a salesperson who has a boss.

41

• In the query below, the alias A corresponds to opening the table as a salesperson who is a boss.

• This is who you're trying to find information about, and it appears in the outer query.

• The alias B corresponds to opening the table as an employee.

• This appears in the inner query.

42

• The joining condition determines which role the A and B play in the query.

• Matching on A.spno = B.bossno means that A is playing the role of boss.

• Where A and B appear in the query depends on whether you are trying to pull information out of A or information out of B.

43

• SELECT *• FROM Salesperson AS A• WHERE EXISTS• (SELECT *• FROM Salesperson AS B• WHERE A.spno = B.bossno)

44

• Looking back at the table contents, it's easy to see that the only salesperson who is a boss is Bugs Bunny.

• Looking at the logic of the query, spno 333, Bugs Bunny's spno, is the only spno where there are other salesperson records in the table that have this value as their bossno.

• As pointed out with an earlier example, in general, it would also be possible to do this query using IN or a join.

45

• 7. The last example of a plain EXISTS query is slightly less realistic than some of the others, but it does give a chance to use a non-equijoin as the correlating condition.

• Verbally, this query could be described as finding those cars that are so cheap that their stickerprices are lower than the dealercosts of some other cars in the Car table.

46

• This is another example where one table, namely the Car table, is opened twice and has to be aliased.

• One opening of the table, in the outer query, corresponds to the very cheap cars.

• The opening in the inner query corresponds to the ones that are not so cheap:

47

• SELECT *• FROM Car AS A• WHERE EXISTS• (SELECT *• FROM Car AS B• WHERE A.stickerprice < B.dealercost)

48

9.3 NOT EXISTS

49

• 1. NOT EXISTS is the negation of EXISTS and its meaning is straightforward.

• If you think back to NOT IN, if IN produced a certain set of results, then NOT IN would produce the complement of that set.

• Similarly, if EXISTS would hold true for a certain set of tuples, then NOT EXISTS would hold false for that set.

50

• Put another way, NOT EXISTS would specify those tuples for which the condition did not hold true.

• If you wanted to find the names of salespeople who didn't sell a car, if there were any that didn't sell a car, you could do it this way:

51

• SELECT name• FROM Salesperson• WHERE NOT EXISTS• (SELECT *• FROM Carsale• WHERE Salesperson.spno = Carsale.spno)

52

• This produces the same results as this query, which you've seen before:

• • SELECT name• FROM Salesperson• WHERE spno NOT IN• (SELECT spno• FROM Carsale)

53

• This means that once again, the user has a choice of what kind of query to use in order to find the desired results.

• Remember that the kinds of queries with negation shown above cannot be accomplished with an inequality join.

54

• The join query shown below is not equivalent and it is unlikely that a situation would arise where you would need the results generated by this query.

• They are similar to a Cartesian product:

• SELECT name• FROM Salesperson, Carsale• WHERE Salesperson.spno <> Carsale.spno

55

• 2. Here is another straightforward example of a NOT EXISTS query.

• An equivalent NOT IN query could be written, but an inequality join would not be equivalent.

• Those alternatives are not shown for this example.

• You should be able to figure out what this query does:

56

• SELECT vin, make, model, year• FROM Car• WHERE NOT EXISTS• (SELECT *• FROM Carsale• WHERE Car.vin = Carsale.vin)

57

9.4 Double NOT EXISTS

58

• In the previous unit, all combinations of nesting and negating IN queries were examined:

• IN/IN• IN/NOT IN• NOT IN/IN• NOT IN/NOT IN

59

• All of these combinations are also possible with nested, negated EXISTS queries.

• However, double NOT EXISTS is the quintessential case, and it will be considered directly.

• If you can understand double NOT EXISTS, then understanding the other cases should not be a problem.

60

• 1. Queries that are doubly nested with NOT EXISTS are the most logically complex queries that will be dealt with here.

• In theory you could nest even deeper than that, but it is difficult to understand beyond double nesting, and it's hard to think of a use for such complicated queries.

• Although not simple, doubly nested NOT EXISTS queries do have a specific and useful meaning.

61

• The keyword EXISTS in SQL corresponds to what is known as the existential quantifier in mathematical logic.

• In a logical proposition, EXISTS is signified by the phrase "there exists".

• It addresses the question of whether or not there exists at least one case where a particular condition holds.

62

• There is another quantifier in mathematical logic, and it is known as the universal quantifier.

• The phrase that signifies this quantifier in a logical proposition is "for all".

• It addresses the question of whether or not a particular condition holds for all cases.

63

• There is no keyword or keyword phrase in SQL such as "FOR ALL".

• In other words, SQL does not have a universal quantifier.

• It turns out that double NOT EXISTS is the logical equivalent of universal quantification.

64

• For better or worse, doubly nested NOT EXISTS queries are the classic way to accomplish universal quantification in SQL.

• You may recall that in analyzing the results of nested IN queries, it frequently came down to a question of, “Are all included or are some included?”

• This means that universal quantification can also be accomplished with IN/NOT IN queries

65

• The reality is that doubly nested NOT EXISTS may be the most straightforward way of dealing with the question of all vs. not all.

• If you prefer to try and deal with this using IN queries, that’s OK.

• You will need to know how to write both IN and EXISTS queries.

• On assignments or tests, if a question is specifically about the concept of “for all”, it is highly likely that the clearest route to a solution will be double NOT EXISTS.

66

• The logical equivalence of "for all" and double NOT EXISTS will not be proven.

• Here is a verbal explanation of the equivalence: • Suppose I am interested in this proposition: • "Condition X holds in all cases." • I claim that this proposition is equivalent: • "There are no cases in which condition X does

not hold."

67

• Both in math and the English language "two negatives yield a positive".

• На русском языке это не совсем так. • You're warned in English class not to write

sentences with double negatives because they are hard to understand.

• They are also hard to understand in SQL, but you have no choice.

68

• If you can master the meaning of a double NOT EXISTS query, you can do universal quantification successfully.

• Even if the verbal explanation is not very clear, the examples which follow will empirically demonstrate how double NOT EXISTS is equivalent to "for all".

69

• 2. In greater detail, this is the structure of the kinds of queries needed in order to accomplish what is under discussion here:

• A. The overall query consists of three smaller queries:

• An outer query, a query inside that one, and another inside the second one.

• This is double nesting.

70

• B. The two nested inner queries are introduced by NOT EXISTS.

• C. All three queries are correlated. – In other words, the overall query is hooked

together like a three-way join.

71

• D. In a simple example, the table of the outer query is joined to the table of the innermost query, and the table of the middle query is also joined to the table of the innermost query. – Needless to say, if this plan is to be realized, there

have to be corresponding fields in the respective tables.

72

• E. In a simple example the joining conditions both appear in the innermost query.

73

• 3. Before giving a doubly nested NOT EXISTS query, a reminder of the relationships between the Customer, Carsale, and Salesperson tables is given here in the form of a straightforward three-way join:

• • SELECT Customer.name, salesprice, Salesperson.name• FROM Customer, Carsale, Salesperson• WHERE Customer.custno = Carsale.custno• AND Carsale.spno = Salesperson.spno

74

• 4. Suppose I'm interested in a query that could be described in this way:

• Find the customers who bought cars from all of the salespeople.

• The "from all" in the verbal description signals universal quantification.

75

• If the first description is ambiguous, maybe this description is more specific:

• Find the customers who have bought at least one car from each of the salespeople.

76

• Finally, it might be helpful in trying to write the query to describe it in the following way.

• This is tortured English, but it closely parallels the logic of the query:

• Find the customers, such that there does not exist a salesperson that they haven't bought a car from.

77

• Put another way: • You're interested in the customers where

there is no salesperson that the customers haven't bought a car from.

• Enough word twisting. • The query follows.

78

• SELECT *• FROM Customer• WHERE NOT EXISTS• (SELECT *• FROM Salesperson• WHERE NOT EXISTS• (SELECT *• FROM Carsale• WHERE Customer.custno = Carsale.custno• AND Carsale.spno = Salesperson.spno))

79

• This query illustrates all of the points made earlier.

• Three tables are opened. • It is doubly nested with NOT EXISTS.

80

• The two joining conditions that link the three tables together on their corresponding fields are in the innermost query.

• The logic of the query as seen when reading its parts in order follows the logic of the doubly negated expression of the meaning of the query in the English language.

81

• There is one more thing worth noting about this example query and universal quantification queries in general:

• Quite often they identify exceptional cases. • How likely is it that there are any customers

who have bought cars from all salespeople? • It is pretty unlikely.

82

• If there are none, then this query will produce no results.

• This is not a sign that the query is wrong. • It simply means that you have asked for an

exceptional case, and there is no such case. • When writing and testing queries of this kind,

do not be surprised if the query returns no results.

83

• There is another approach to analyzing and understanding a query like this.

• In general, a doubly nested query can involve three tables.

• Those three tables are likely to consist of one entity in a many-to-many relationship with another, with a table in the middle.

• An E-R diagram of this situation follows:

84

Entity1 Table in the middle Entity2

85

• The important thing to notice about the structure of the double NOT EXISTS query is this:

• Reading from left to right in the diagram, you have Entity 1, Table in the middle, Entity 2.

• The order that the tables appear in the query is Entity 1, Entity 2, Table in the middle.

• The two joining conditions, which both involve the Table in the middle, occur in the innermost subquery, along with the Table in the middle.

86

• 5. It is easy to make another example by turning the previous query around.

• Suppose you'd like to find the salespeople who have sold cars to all customers.

• Again, there are various ways to rephrase it. • One way which is close to the logic of the SQL

query is: • Find the salespeople where there does not exist a

customer that the salesperson hasn't sold a car to.

87

• It should be apparent that the order of tables in the query makes a big difference.

• Turning the query around logically results in the tables appearing in this query in the opposite order to their appearance in the previous one.

• The joining conditions remain the same. • Here is the SQL for this query:

88

• SELECT *• FROM Salesperson• WHERE NOT EXISTS• (SELECT *• FROM Customer• WHERE NOT EXISTS• (SELECT *• FROM Carsale• WHERE Salesperson.spno = Carsale.spno• AND Carsale.custno = Customer.custno))

89

• If anything, this query may be even more unlikely to have a result than the previous example.

• Even if the query doesn't return a result, that doesn't mean that it's wrong.

90

• 6. The parts of the example can be rearranged again, resulting in a different query, which illustrates other things.

• Here is a query where the Customer table comes first, the Carsale table comes second, and the Salesperson table comes third:

91

• SELECT *• FROM Customer• WHERE NOT EXISTS• (SELECT *• FROM Carsale• WHERE Customer.custno = Carsale.custno• AND NOT EXISTS• (SELECT *• FROM Salesperson• WHERE Carsale.spno = Salesperson.spno))

92

• There are several points to be made about this query.

• The first point is syntactical. • Unlike the previous examples, one of the joining

conditions is in the first subquery and the other is in the second subquery.

• This query won't work in Microsoft Access SQL if the condition in the first subquery is put together with the condition in the second subquery.

93

• Look at the tables involved in the first condition:

• They are Customer and Carsale. • If this condition is moved into the second

subquery, it will be inside a nest where neither of these tables exist.

• This doesn't cause an error message to be generated, but this condition will be ignored.

94

• The second point has to do with the meaning of the query.

• In words, it might be translated as: • Find the customers where there is no carsale to

that customer where there isn't a salesperson for the carsale.

• In other words, find the customers where all of the purchases they made were made from salespeople.

95

• Unlike the previous examples, you would expect to get results from this query since you would expect most customers to have made their purchases from salespeople.

• Depending on the data in the database, there may be some customer who at some time has bought a car without a salesperson being recorded for the sale.

• If so, that customer would not be in the results of the query.

96

• If you look at the E-R diagram again, you see that the order of the tables in the query agrees with the order of the table, reading form left to right in the diagram.

97

Entity1 Table in the middle Entity2

98

• Keep in mind that if the tables are ordered in this way, the meaning of the query is drastically changed.

• Also keep in mind that the placement of the joining conditions in the query is also changed.

99

• That leads to the third point. • It is possible to write queries with both EXISTS

and NOT EXISTS. • Suppose you wanted to find any customers

who had made at least one purchase without a salesperson

• Suppose you also wanted to exclude those customers who only made purchases from salespeople

100

• Then you could rewrite the query above using EXISTS for the first subquery and NOT EXISTS for the second.

• It is worth noting that such a query would not be restricted to only those customers who made all of their purchases without salespeople.

• The query is shown on the next overhead.

101

• SELECT *• FROM Customer• WHERE EXISTS• (SELECT *• FROM Carsale• WHERE Customer.custno = Carsale.custno• AND NOT EXISTS• (SELECT *• FROM Salesperson• WHERE Carsale.spno = Salesperson.spno))

102

• In general, these queries can be arbitrarily complex.

• There can be both EXISTS and NOT EXISTS; • the joining conditions may be equality or not

equality; • you can nest more than two levels, and so on. • The challenge comes from understanding

what these arbitrarily complex queries mean.

103

• 7. Here is one last example. • It is of less practical consequence than the

previous examples, but it illustrates some useful points.

• If you understand how it works, it is a sign that you understand the relationships between the different subqueries of a double NOT EXISTS query.

104

• Suppose you are interested in finding out information similar to one of the previous queries, such as customers who bought from all salespeople.

• Suppose however, that you don't need the names of those customers in the results, just their custno's.

• All of the information needed for the query is in one table, the Carsale table, but it would have to be opened three times, with a different alias for each of its three appearances in the query.

105

• One time it would be opened because it contained customer information, another time because it contained salesperson information, and the last time because it contained information about car sales.

• The only way you can tell what role an opening of a table is playing in the query is by which of its fields is used in the joining condition.

• Here is an example:

106

• SELECT custno• FROM Carsale AS A• WHERE NOT EXISTS• (SELECT *• FROM Carsale AS B• WHERE NOT EXISTS• (SELECT *• FROM Carsale AS C• WHERE A.custno = C.custno• AND B.spno = C.spno))

107

• Carsale opened as A is playing the role of the Customer table.

• You know this not just because custno is selected from it, but because of the joining condition A.custno = C.custno.

• The joining field from A is the field which relates to customers, custno.

108

• Carsale opened as B is playing the role of the Salesperson table.

• You know this because the joining field from B is the field which relates to salespeople, spno.

109

• Carsale opened as C is playing the role of itself, namely the table in the middle of the many-to-many relationship between customers and salespeople.

• You know this because C appears in both of the joining conditions, once on custno and once on spno.

• This query finds the custno's of customers who have bought a car from all of the salespeople.

110

• There is one last point to be made about this example.

• It is not exactly equivalent to the version of the query where the Customer and Salesperson tables were actually used.

• This query finds those queries who bought a car from every salesperson who actually sold a car.

111

• The previous version found customers who literally bought a car from all salespeople.

• If, by chance, there was a salesperson in the Salesperson table who hadn’t even sold one car, then the previous version of the query would not produce any results.

112

The End

SQL Unit 9 Correlated Subqueries

Documents