Haas MFE SAS Workshop Lecture 3: Peng Liu http:// faculty.haas.berkeley.edu/peliu/computing Haas School of Business, Berkeley, MFE 2006
Mar 28, 2015
Haas MFE SAS WorkshopLecture 3:
Peng Liu http://faculty.haas.berkeley.edu/peliu/computing
Haas School of Business, Berkeley, MFE 2006
SAS SQL
Peng Liu http://faculty.haas.berkeley.edu/peliu/computing
Haas School of Business, Berkeley, MFE 2006
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 3
PROC SQL - What
What does SQL can do? SelectingOrdering/sortingSubsetting RestructuringCreating table/viewJoining/MergingTransforming variablesEditing
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 4
PROC SQL - Why
The Advantage of using SQLCombined functionalityFaster for smaller tablesSQL code is more portable for non-SAS
applicationsNot require presorting Not require common variable names to
join on. (need same type , length)
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 5
Selecting Data
PROC SQL;
SELECT DISTINCT rating FROM MFE.MOVIES;
QUIT;
The simplest SQL code, need 3 statements By default, it will print the resultant query, use
NOPRINT option to suppress this feature. Begin with PROC SQL, end with QUIT; not RUN; Need at least one SELECT… FROM statement DISTINCT is an option that removes duplicate rows
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 6
Ordering/Sorting Data
PROC SQL ;
SELECT *
FROM MFE.MOVIES
ORDER BY category;
QUIT;
Remember the placement of the SAS statements has no effect; so we can put the middle statement into 3 lines
SELECT * means we select all variables from dataset MFE.MOVIES
Put ORDER BY after FROM. We sort the data by variable “category”
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 7
Subsetting Data- Character searching in WHERE
PROC SQL;
SELECT title, category
FROM MFE.MOVIES
WHERE category CONTAINS 'Action';
QUIT;
Use comma (,) to separate selected variables CONTAINS in WHERE statement only for character
variables Also try WHERE UPCASE(category) LIKE '%ACTION%';
Use wildcard char. Percent sign (%) with LIKE operator.
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 8
Subsetting Data- Phonetic Matching in WHERE
PROC SQL;
SELECT title, category, rating
FROM MFE.MOVIES
WHERE category =* 'Drana';
QUIT;
Always Put WHERE after FROM
Sounds like operator =* Search movie title for the phonetic variation of
“drama”, also help possible spelling variations
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 9
Case Logic - reassigning/recategorize
PROC SQL;
SELECT title, rating,
CASE rating
WHEN 'G' THEN ‘General' ELSE 'Other'
END AS level
FROM MFE.MOVIES;
QUIT;
The order of each statement is important CASE …END AS should in between SELECT and FROM Note there is , after the variables you want to select Use WHEN … THEN ELSE… to redefine variables Rename variable from “rating” to “level”
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 10
Creating New Data - Create Table
PROC SQL;
CREATE TABLE ACTION AS
SELECT title, category
FROM MFE.MOVIES
WHERE category CONTAINS 'Action';
QUIT;
CREATE TABLE … AS can always be in front of SELECT … FROM statement to build a sas file.
In SELECT, the results of a query are converted to an output object (printing). Query results can also be stored as data. The CREATE TABLE statement creates a table with the results of a query. The CREATE VIEW statement stores the query itself as a view. Either way, the data identified in the query can beused in later SQL statements or in other SAS steps.
Produce a new dataset (table) ACTION in work directory, no printing
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 11
Creating New Data - Create View
PROC SQL;
CREATE VIEW G_MOVIES AS
SELECT title, category, rating
FROM MFE.MOVIES
WHERE rating = 'G'
ORDER BY title;
SELECT * FROM G_MOVIES;
QUIT;
First step-creating a view,no output is produced; then display the desired output results
Use ; to separate two block of code inside of proc sql When a table is created, the query is executed and the resulting
data is stored in a file. When a view is created, the query itself is stored in the file. The data is not accessed at all in the process of creating a view.
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 12
Join Tables (Merge datasets) - Cartesian Join
PROC SQL;
SELECT *
FROM MFE.CUSTOMERS, MFE.MOVIES;
QUIT;
Terminology: Join (Merge) datasets (tables) No prior sorting required – one advantage over DATA MERGE Use comma (,) to separate two datasets in FROM Without WHERE, all possible combinations of rows from each
tables is produced, all columns are included Turn on the HTML result option for better display:Tool/Options/Preferences…/Results/ check Create HTML/OK
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 13
Join Tables (Merge datasets) - Inner Join using WHERE
PROC SQL;
SELECT *
FROM MFE.MOVIES, MFE.ACTORS
WHERE MOVIES.title = ACTORS.title;
QUIT; Use WHERE to specify connecting columns (title) table1.matchvar = table2.matchvar Produce rows that have same movie title The matching variable can be of different name
different datasets
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 14
Join Tables (Merge datasets) - Inner Join using WHERE (Cont.)
PROC SQL;
SELECT M.title, M.rating, A.actor_leading
FROM MFE.MOVIES M, MFE.ACTORS A
WHERE MOVIES.title = ACTORS.title;
QUIT;
Short-cut for table names Can be used in SELECT and WHERE statements Need to be declared in FROM statement
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 15
Join Tables (Merge datasets) - Join three tables
PROC SQL;
SELECT C.cust_no,
M.title,M.rating, M.category,
A.actor_leading
FROM MFE.CUSTOMERS C,
MFE.MOVIES2 M,
MFE.ACTORS A
WHERE C.cust_no = M.cust_no AND M.title = A.title;
QUIT;
Use AND in WHERE statement to specify two matching conditions
Produce rows that satisfies all the conditions Note: We use MOVIES2 in this example Can join up to 32 tables in one SQL code
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 16
Join Tables (Merge datasets) - Inner Joins using ON
PROC SQL;
SELECT M.title, rating,actor_leading
FROM MFE.MOVIES M
INNER JOIN MFE.ACTORS A
ON M.TITLE = A.TITLE;
QUIT;
Same result as using where WHERE is used to select rows from inner joins ON is used to select rows from outer or inner
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 17
Join Tables (Merge datasets) - Left Outer Joins
PROC SQL;
SELECT MOVIES.title, actor_leading, rating
FROM MFE.MOVIES
LEFT JOIN
MFE.ACTORS
ON MOVIES.title = ACTORS.title;
QUIT;
Resulting output contains all rows for which the SQL expression, referenced in the ON clause, matches both tables and all rows from LEFT table (MOVIES) that did not match any row in the right (ACTORS) table.
Essentially the rows from LEFT table are preserved and captured exactly as they stored in the table itself, regardless if a match exists.
Need to specify a table name for the matching variable in SELECT
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 18
Join Tables (Merge datasets) - Right Outer Joins
PROC SQL;
SELECT ACTORS.title, actor_leading, rating
FROM MFE.MOVIES
RIGHT JOIN
MFE.ACTORS
ON MOVIES.title = ACTORS.title;
QUIT;
Resulting output contains all rows for which the SQL expression, referenced in the ON clause, matches both tables and all rows from RIGHT table (ACTORS) that did not match any row in the right (MOVIES) table.
Essentially the rows from RIGHT table are preserved and captured exactly as they stored in the table itself, regardless if a match exists.
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 19
Join Tables (Concatenating) - Outer Union
PROC SQL;
SELECT * FROM MFE.CUSTOMERS
OUTER UNION
SELECT * FROM MFE.MOVIES;
QUIT;
SQL performs OUTER UNION, similar to DATA steps with a SET statement to Concatenate datasets.
The result contains all the rows produced by the first table-expression followed by all the row produced by the second table-expression.
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 20
Transforming Data- Creating new Variables
/*Creating new variables*/
PROC SQL;
SELECT title, length, category, year, rating,
2006-year AS age
FROM MFE.MOVIES;
QUIT;
You can create new variables within SELECT statement, the name of new variable follows after AS.
Note the order of the express is reversed
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 21
Transforming Data- Summarizing Data using SQL functions
PROC SQL;
SELECT *,
COUNT(title) AS notitle,
MAX(year) AS most_recent,
MIN(year) AS earliest,
SUM(length) AS total_length,
NMISS(rating) AS nomissing
FROM MFE.MOVIES
GROUP BY rating;
QUIT;
Simple summarization functions available All function can be operated in GROUPs
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 22
Editing Data – Insert observations.
PROC SQL NOPRINT;
INSERT INTO MFE.CUSTOMERS
VALUES(1 'Peng');
INSERT INTO MFE.CUSTOMERS
SET Cust_no=2,Name='Sasha';
QUIT;
There are two ways of inserting observations into a table. Data type should be the same.
VALUES( ) new values are separated by space. SET column name = newly assigned values, delimited
by commas.
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 23
Editing Data – Deleting rows and Dropping columns
/*Deleting rows*/
PROC SQL;
DELETE
FROM MFE.MOVIES
WHERE length LE 100;
QUIT;
/*Droping variables*/
PROC SQL;
CREATE TABLE NEW (DROP=rating) AS
SELECT *
FROM MFE.MOVIES;
QUIT;
Deleting columns can be done in SELECT or in DROP on created table
Haas School of Business, Berkeley, MFE 2006 Peng Liu AND Alexander Vedrashko 24
Editing Data– Update observations
/*Updating Observation*/
PROC SQL NOPRINT;
UPDATE MFE.CUSTOMERS
SET Name='Liu'
WHERE Cust_no=1;
QUIT; UPDATE … SET… WHERE Find the observation and set new value If more than one observations satisfies the condition,
all are updated with the new data in SET statement