Top Banner
SQL
39

SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

Mar 28, 2015

Download

Documents

Jacob Wigley
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

SQL

Page 2: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

What is SQL?• DML• DDL • Partially declarative• Based on the algebra via the tuple calculus, and therefore

its core has an elegant set-theoretic foundations• Provides a sound foundation for mathematically precise

optimization• Has very simple structure; optimized for moving large

numbers of tuples at once• Set-based, retrieval-based

Page 3: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

What are relational databases?

Tables and SQL, a programming language engineered for

high volume data applications

“must always be correct” data management apps

transaction-based applications

non-network, non-object based data

Page 4: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

Chapter 3: The syntax of SQL

SELECT select_list [FROM table_source] [WHERE search_condition] [ORDER BY order_by_list] [LIMIT row_limit]

Page 5: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

Details of the Select clause

SELECT [ALL|DISTINCT] column_specification [[AS] result_column] [, column_specification [[AS] result_column]]

All columns in a base table

Column name in a base table

Calculation

Function

Page 6: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

Details of Select, continued• Select * means all columns (attributes)• Using arithmetic

• SELECT invoice_total - payment_total – credit_total AS balance_due

• Using a function• SELECT CONCAT(first_name, ' ', last_name) AS full_name

• Renaming a column• SELECT invoice_number AS "Invoice Number",

invoice_date AS Date, invoice_total AS Total

FROM invoices

Page 7: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

Operator precedence

Operator Name Order of precedence * Multiplication 1 / Division 1 DIV Integer division 1 % (MOD) Modulo (remainder) 1 + Addition 2 - Subtraction 2

Page 8: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

Calculations & concatenation & strings

SELECT invoice_total, payment_total, credit_total,

invoice_total - payment_total - credit_total AS balance_due

FROM invoices

SELECT vendor_city, vendor_state, CONCAT(vendor_city, vendor_state)

FROM vendors

SELECT vendor_contact_first_name,vendor_contact_last_name CONCAT(LEFT(vendor_contact_first_name, 1),

LEFT(vendor_contact_last_name, 1)) AS initials

FROM vendors

Page 9: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

Date function, round functionSELECT invoice_date,

DATE_FORMAT(invoice_date, '%m/%d/%y') AS 'MM/DD/YY',

DATE_FORMAT(invoice_date, '%e-%b-%Y') AS 'DD-Mon-YYYY'

FROM invoices

SELECT invoice_date, invoice_total,

ROUND(invoice_total) AS nearest_dollar,

ROUND(invoice_total, 1) AS nearest_dime

FROM invoices

Page 10: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

Where clause formatWHERE [NOT] search_condition_1 {AND|OR}

[NOT] search_condition_2 ...

WHERE NOT (invoice_total >= 5000

OR NOT invoice_date <= '2011-08-01')

WHERE invoice_total < 5000

AND invoice_date <= '2011-08-01'

Page 11: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

In phrase and nested selectsWHERE test_expression [NOT] IN ({subquery|expression_1 [, expression_2]...})

WHERE vendor_id IN

(SELECT vendor_id

FROM invoices

WHERE invoice_date = '2011-07-18')

Page 12: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

Like clauseWHERE vendor_city LIKE 'SAN%'

Cities that will be retrieved

“San Diego”, “Santa Ana”

WHERE vendor_name LIKE 'COMPU_ER%'

Vendors that will be retrieved

“Compuserve”, “Computerworld”

Page 13: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

Order By clauseSELECT vendor_name,

CONCAT(vendor_city, ', ', vendor_state, ' ',

vendor_zip_code) AS address

FROM vendors

ORDER BY vendor_name

Page 14: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

Chapter 4: Manipulating multiple tables

• Join• Equijoin• Natural join• Outer join• Self join• N-way join

Page 15: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

Explicit versus implicit joins

SELECT vendor_name, invoice_number, invoice_date,

line_item_amount, account_description

FROM vendors v

JOIN invoices i

ON v.vendor_id = i.vendor_id

JOIN invoice_line_items li

ON i.invoice_id = li.invoice_id

JOIN general_ledger_accounts gl

ON li.account_number = gl.account_number

WHERE invoice_total - payment_total - credit_total > 0

ORDER BY vendor_name, line_item_amount DESC

SELECT invoice_number, vendor_name

FROM vendors v, invoices i

WHERE v.vendor_id = i.vendor_id

ORDER BY invoice_number

Page 16: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

Left outer join and natural joinSELECT vendor_name, invoice_number, invoice_total

FROM vendors LEFT JOIN invoices

ON vendors.vendor_id = invoices.vendor_id

ORDER BY vendor_name

SELECT invoice_number, vendor_name

FROM vendors

NATURAL JOIN invoices

ORDER BY invoice_number

Page 17: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

Union operatorSELECT 'Active' AS source, invoice_number,

invoice_date, invoice_total

FROM active_invoices

WHERE invoice_date >= '2011-06-01'

UNION

SELECT 'Paid' AS source, invoice_number,

invoice_date, invoice_total

FROM paid_invoices

WHERE invoice_date >= '2011-06-01'

ORDER BY invoice_total DESC

Page 18: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

Chapter 5: Aggregates in queriesAVG([ALL|DISTINCT] expression)

SUM([ALL|DISTINCT] expression)

MIN([ALL|DISTINCT] expression)

MAX([ALL|DISTINCT] expression)

COUNT([ALL|DISTINCT] expression)

COUNT(*) – counts nulls

Page 19: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

Issues with aggregate data• This is when we intentionally create a new form of “object”

and shift to ones that are deliberately not related to individual tuples

• It’s selections and projections that often unintentionally lose a perspective on object identity

Page 20: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

Aggregate examplesSELECT COUNT(*) AS number_of_invoices,

SUM(invoice_total – payment_total – credit_total)

AS total_due

FROM invoices

WHERE invoice_total – payment_total – credit_total > 0

SELECT 'After 1/1/2011' AS selection_date,

COUNT(*) AS number_of_invoices, ROUND(AVG(invoice_total), 2) AS avg_invoice_amt,

SUM(invoice_total) AS total_invoice_amt

FROM invoices

WHERE invoice_date > '2011-01-01‘

SELECT 'After 1/1/2011' AS selection_date,

COUNT(*) AS number_of_invoices,

MAX(invoice_total) AS highest_invoice_total,

MIN(invoice_total) AS lowest_invoice_total

FROM invoices

WHERE invoice_date > '2011-01-01'

Page 21: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

Aggregate examples, continuedSELECT MIN(vendor_name) AS first_vendor,

MAX(vendor_name) AS last_vendor,

COUNT(vendor_name) AS number_of_vendors

FROM vendors

SELECT COUNT(DISTINCT vendor_id) AS number_of_vendors,

COUNT(vendor_id) AS number_of_invoices,

ROUND(AVG(invoice_total), 2) AS avg_invoice_amt,

SUM(invoice_total) AS total_invoice_amt

FROM invoices

WHERE invoice_date > '2011-01-01'

Page 22: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

Having clause (with group by)SELECT select_list

FROM table_source

[WHERE search_condition]

[GROUP BY group_by_list]

[HAVING search_condition]

[ORDER BY order_by_list]

Page 23: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

Group by and Having: complexities• Group by: how to vertically group tuples• Having: which groups will be included in the final result• Note: All of this is after the Where clause is applied• Group by: based on columns or expressions that have

columns in them• If there are any calculations done in the Select clause, this

happens after the Group by clause; i.e., it is performed for each group that results from the Group by

• Group by can be nested if you specify more than one column

• Order by operators are performed after the Having

Page 24: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

Having example, group by exampleSELECT vendor_id, ROUND(AVG(invoice_total), 2)

AS average_invoice_amount

FROM invoices

GROUP BY vendor_id

HAVING AVG(invoice_total) > 2000

ORDER BY average_invoice_amount DESC

(Note: 2 is the number of decimals in the result.)

SELECT vendor_name,

COUNT(*) AS invoice_qty,

ROUND(AVG(invoice_total),2) AS invoice_avg

FROM vendors JOIN invoices

ON vendors.vendor_id = invoices.vendor_id

WHERE invoice_total > 500

GROUP BY vendor_name

ORDER BY invoice_qty DESC

Page 25: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

These are the same…SELECT invoice_date,

COUNT(*) AS invoice_qty,

SUM(invoice_total) AS invoice_sum

FROM invoices

GROUP BY invoice_date

HAVING invoice_date BETWEEN '2011-05-01' AND '2011-05-31'

AND COUNT(*) > 1

AND SUM(invoice_total) > 100

ORDER BY invoice_date DESC

SELECT invoice_date, COUNT(*) AS invoice_qty,

SUM(invoice_total) AS invoice_sum

FROM invoices

WHERE invoice_date BETWEEN '2011-05-01' AND '2011-05-31'

GROUP BY invoice_date

HAVING COUNT(*) > 1 AND SUM(invoice_total) > 100

ORDER BY invoice_date DESC

Page 26: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

We will shift our focus a bit…• We’ll run the actually queries that appear in chapters 5

onward• The slides will contain overview material and not the

actual queries

Page 27: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

Chapter 6: subqueries• Often used to pass an aggregate value to a parent query• Often a good way to book-keep what might have been a

very complex WHERE clause, with perhaps a multiway join

• A good way to make a query look more readable to someone who uses it later

• Subqueries can also be reused in other queries• Note: you cannot use the SELECT attributes from the

embedded query in the parent query unless it directly references the appropriate table(s) in the outer FROM clause

Page 28: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

Subqueries, continued• Important tool: IN operator, which is a set “element of”

operator (written with an epsilon), and you can negate the IN (not IN)

• Often you use SOME or ALL or ANY, if you want to return multiple values (i.e., multiple tuples, perhaps with only one attribute each)

• The default if you don’t use one of the operators above is to return a single value

• ALL is a set “for all elements of” operator (written with an upside down A)

• ANY is a set “there exists” operator (written with a backward E)

Page 29: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

Subqueries, continued• The default is that a subquery only executes once, but

you can use a “correlated” query so that it will run once for each row processed by the parent query. • This breaks the “execute from the inside out” paradigm on an

uncorrelated subquery.• EXISTS is often used with correlated queries

• You can put a subquery in a HAVING, FROM, or SELECT clause as well• But such queries get messy and we will skip this for now.

• It’s a good idea to write and test subqueries independently whenever possible, unless they are trivial

Page 30: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

Examples from chapter 6• 7: passing a single value• 6: ANY• 3: IN• 8: NOT EXISTS• 5: ALL

Page 31: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

Chapter 7: changing the database state• This is when we need a transaction protocol• Updates must never overlap with each other or with read-

only queries• Read-only queries can overlap• But we want to increase through put by supporting as

much “concurrency” as possible• Each transaction has the potential to update the DB state

Page 32: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

2 Phase Transactions• Each SQL program is within a begin and end transaction

pair• Each transaction has its own workspace for DB items it is

going to update• Any transactions that overlap in execution time will appear

to have run in some serial order• This is done by transactions requesting read and write locks

(also known as shared and exclusive locks)• Read locks can be shared with other readers• Write locks cannot be shared with readers or writers• All locks held until the end of the transaction• They are released and then the changes that a transaction

has made are moved to the DB

Page 33: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

Serializability of transactions• The net effect is that the transactions that overlap in

execution time appear to have run in some serial order• Transactions can be undone by throwing away the local

store (conceptually, at least)• The write period at the end of the transaction must be

atomic• The two phases:

• Request read, write, upgrade locks (and wait on locks) and process

• Release locks and move updates to the DB

• There is a notion of “serializability” which means that the actual schedule of executed steps corresponds to some serial order of running the transactions

Page 34: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

Interesting facts on transactions…• Various legal schedules might produce different results• A crash during phase two can lead the database

inconsistent• The transaction manager uses a lot of overhead

resources handling locks• We still need to be able to roll the database back and

rerun transaction logs • User must control the nature of overlapping transactions

or there might be very little true concurrency• In a distributed database, the lock manager is a

bottleneck because all components of the database must move in lockstep

Page 35: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

Updating data: changing the DB stateINSERT INTO invoices VALUES

(115, 97, '456789', '2011-08-01', 8344.50, 0, 0, 1, '2011-08-31', NULL);

INSERT INTO invoices VALUES

(116, 97, '456701', '2011-08-02', 270.50, 0, 0, 1, '2011-09-01', NULL),

(117, 97, '456791', '2011-08-03', 4390.00, 0, 0, 1, '2011-09-02', NULL),

(118, 97, '456792', '2011-08-03', 565.60, 0, 0, 1, '2011-09-02', NULL);

Page 36: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

Updating tables, continuedUSE ex;

INSERT INTO color_sample (color_number)

VALUES (606);

INSERT INTO color_sample (color_name)

VALUES ('Yellow');

INSERT INTO color_sample

VALUES (DEFAULT, DEFAULT, 'Orange');

INSERT INTO color_sample

VALUES (DEFAULT, 808, NULL);

INSERT INTO color_sample

VALUES (DEFAULT, DEFAULT, NULL);

Page 37: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

Updating tables, continuedINSERT INTO invoice_archive

SELECT *

FROM invoices

WHERE invoice_total - payment_total - credit_total = 0;

INSERT INTO invoice_archive

(invoice_id, vendor_id, invoice_number, invoice_total, credit_total,

payment_total, terms_id, invoice_date, invoice_due_date)

SELECT

invoice_id, vendor_id, invoice_number, invoice_total, credit_total,

payment_total, terms_id, invoice_date, invoice_due_date

FROM invoices

WHERE invoice_total - payment_total - credit_total = 0;

Page 38: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

Updating, continued

UPDATE invoicesSET payment_date = '2011-09-21', payment_total = 19351.18WHERE invoice_number = '97/522';

UPDATE invoicesSET terms_id = 1WHERE vendor_id = 95;

UPDATE invoicesSET credit_total = credit_total + 100WHERE invoice_number = '97/522';

Page 39: SQL. What is SQL? DML DDL Partially declarative Based on the algebra via the tuple calculus, and therefore its core has an elegant set-theoretic foundations.

Updating, continuedDELETE FROM general_ledger_accounts

WHERE account_number = 306;

DELETE FROM invoice_line_items

WHERE invoice_id = 78 AND invoice_sequence = 2;

DELETE FROM invoice_line_items

WHERE invoice_id = 12;

DELETE FROM invoice_line_items

WHERE invoice_id IN

(SELECT invoice_id

FROM invoices

WHERE vendor_id = 115);