1 Introduction to SAS Statistical Package Biostatistics 140.632 Lecture 1 2 Instructor: Lucy Meoni [email protected]Teaching Assistant : Sorina Eftim [email protected]Lecture/Lab: Room 3017 WEB site: www.biostat.jhsph.edu/bstcourse/bio632/default.htm e-mail: [email protected]to submit exercises 3 Using the PC labs SAS Version 9.0 • requires basic Windows skills • Set up class folder on your media (thumb drive, diskette) • download files from the website for lab into class folder • bring thumb drive or diskette to class 4 Text : ‘The Little SAS Book 3rd edition’ Other References: SAS online documentation Online tutor SAS system help Many, many SAS manuals SAS website www.sas.com 5 WHAT IS SAS? Integrated system of software products • began as software package for statistical analysis • data management • reporting and graphics • analytic • etc. 6 COURSE OBJECTIVES • to introduce and develop skills in SAS; a statistical package used in research data analysis • develop the skills necessary to create and modify a SAS data set and perform statistical analyses
22
Embed
Introduction to SAS Statistical Package · 1 Introduction to SAS Statistical Package Biostatistics 140.632 Lecture 1 2 Instructor: Lucy Meoni [email protected] Teaching Assistant: Sorina
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
• SAS programs• SAS windowing environment• SAS tables (data sets)• SAS libraries• Creating SAS tables − IMPORT wizard− StatTransfer
• Temporary vs permanent files10
SAS• use statements to write a series of
instructions called a SAS program• not command line driven (STATA)• statements are written using the SAS
language (a programming language that you use to manage your data).
• sequence of statements executed in order
• SAS procedures are software tools for data analysis and reporting.
11
The SASProgramming
ProcessCreate a SAS Program
Enter the SAS Program Code
Process the SAS Program Code
Review the Results
Debug or Modify
Define the Need
Introduction to SAS Programs
3
13
DATA steps are typically used to create SAS data sets.
PROC steps are typically used to process SAS data sets (that is, generate reports and graphs, edit data, and sort data).
A SAS program is a sequence of steps that the user submits for execution.
RawData
DATAStep
Report
SASData Set
SASData Set
PROCStep
SAS Programs
14
LIBNAME mylib 'd:\temp\sasclass';DATA newclass; SET mylib.class;BMI=(weight*.454)/((height*.0254)**2);
PROC PRINT DATA=newclass;
PROC MEANS DATA=newclass;CLASS gender;VAR bmi;RUN;
DATA Step
PROC Steps
SAS Program
15
SAS steps begin with a
DATA statement
PROC statement.
SAS detects the end of a step when it encounters
a RUN statement (for most steps)
a QUIT statement (for some procedures)
the beginning of another step (DATA statement or PROC statement).
Step Boundaries
16
DATA newclass; SET mylib.class;BMI=(weight*.454)/((height*.0254)**2);
PROC PRINT DATA=newclass;
PROC MEANS DATA=newclass;CLASS gender;VAR bmi;
RUN;
Step Boundaries : Lecture1.sas
Running SAS Programs
18
– Invoke the SAS System and include a SAS program into your session.
– Submit a program and browse the results.
– Navigate the SAS windowing environment.
Objectives
4
19
Interactive windows enable you to interface with SAS.SAS Windowing Environment
Navigating the SAS Windowing Environment
21
Define the Need
Create a SAS Program
Enter the SAS Program Code
Process the SAS Program Code
Review the Results
Debug or Modify
The SAS Programming
Process
22
Define the Need
Create a SAS Program
Enter the SAS Program Code
Process the SAS Program Code
Review the Results
Debug or Modify
The SASProgramming
Process
Entering and Executing SAS Code
24
Objectives
• Enter SAS program code in the SAS windowing environment and execute the program.
5
25
Enter the SAS Program Code
Once the planning and coding effort for a SAS program is complete, the SAS program code must be entered into the computer to process and to test the program.
26
Windowing Mode
Windowing mode is a facility that enables you to enter and execute SAS programs and view the results in an interactive environment.An interactive environment permits the program to be processed immediately when submitted for execution.
27
The SAS windowing environment is made up of a collection of windows.
There are three primary windows in the windowing environment.
3 2 1
28
The Enhanced Program Editor window enables SAS program code to be
• entered from the keyboard
• submitted for execution.
• SAS program elements are color-coded, including procedures, keywords, numeric and string constants, undefined keywords.
29
The Log window displays
• the SAS program code submitted for execution
• messages from SAS indicating the status of the program execution.
30
The Output window
• displays reports generated by the SAS program.
6
31
Commands are used to navigate among the various windows of the SAS windowing environment and are used to execute a program.
Depending upon the operating environment, commands can be issued by ...
32
Selecting from pull-down menu
Typing the command
Using function keys (F1 - F12)
Clicking on a tool button
33
Entering and Executing SAS Code
This demonstration illustrates entering SAS program code into the SAS Windowing Environment and executing the program.
34
SAS Windowing Environment
• windowing system for editing and executing SAS programs
• interactive full screen
• collection of windows for editing programs, executing programs and displaying results
• five basic SAS windows
35
Starting SAS
• Start SAS from the START button, PROGRAMS, The SAS System for Windows V9 system
• interactive full screen
• Enhanced Editor window, LOG window, and Explorer windows appear
• activate window by clicking within the window
36
7
37
ENHANCED PROGRAM EDITOR• text editor
• write and edit programs
• submit programs (use SUBMIT icon)
• save program statements to file with extension .sas
• asterisk (*) appears in title bar to indicate file has not been saved
• multiple windows possible38
ENHANCED PROGRAM EDITOR
• an ASCII editor that uses visual aides to help you write and debug your SAS programs.
• SAS program elements are color- coded, including procedures, keywords, informats and formats, dates, numeric and string constants, macro keywords, undefined keywords, and more.
• File, Open on menu to read in an existing SAS program file
39 40
LOG Window
• contains the compilation and execution results of DATA
• contains submitted program statements• messages from SAS about compilation
and execution -notes, warnings, errors• save contents of this window to a file
with extension .log (File, Save menu)• clear window by clicking NEW icon on
Toolbar
41
LOG WINDOW CONTENTS• DATA Step
• name of the SAS data set read and the number of observations and variables in the data set
• name of the SAS data set created and the number of observations and variables in the new SAS data set
42
LOG WINDOW
PROC Steps
• index with page numbers for “successful” procedures
8
43 44
OUTPUT WINDOW
• printable results from procedures
• save contents of this window to a
• file with extension .lst
• empty if program did not run
• - CHECK LOG window for errors
• indexed in the RESULTS window
• clear window and by clicking on NEW icon on Toolbar
45 46
47
RESULTS WINDOW• table of contents for OUTPUT
window
• lists each part of your results in an outline form
• possible to save and/or print sections of results by right-clickingon section
When you execute a SAS program, the output generated by SAS is divided into two major parts:
SAS log contains information about the processing of the SAS program, including any warning and error messages.
SAS output contains reports generated by SAS procedures and DATA steps.
Submitting a SAS Program
53
Running a SAS Program
lecture1.sas
• This demonstration illustrates how to start a SAS session, include and submit a SAS program, and browse the results.
54
SAS Log1 LIBNAME mylib 'd:\temp\sasclass';NOTE: Libref MYLIB was successfully assigned as follows:
Engine: V9Physical Name: d:\temp\sasclass
2 DATA newclass; SET mylib.class;3 BMI=(weight*.454)/((height*.0254)**2);4NOTE: There were 5 observations read from the data set MYLIB.CLASS.NOTE: The data set WORK.NEWCLASS has 5 observations and 7 variables.5 PROC PRINT DATA=newclass;6NOTE: There were 5 observations read from the data set WORK.NEWCLASS.7 proc means DATA=newclass;8 var bmi ;9 class gender;10 run;NOTE: There were 5 observations read from the data set WORK.NEWCLASS.1112 RUN;
– Define the components of a SAS data set.– Define a SAS variable.– Identify a missing value and a SAS date
value.– State the naming conventions for SAS data
sets and variables.– Explain SAS syntax rules.– Investigate a SAS data set using the
CONTENTS and PRINT procedures.
Objectives
60
SAS Data Sets
Data Entry
External File
Conversion Process
SAS Data Set
Descriptor Portion
Data Portion
Other Software
Files
11
61
General data set information * data set name * data set label* date/time created * storage information* number of observations
Information for each variable* Name * Type * Length * Position* Format * Informat * Label
Descriptor Portion
Data Portion
SAS data sets have a descriptor portion and a data portion.
SAS Data Sets
62
• descriptor portion of a SAS data set contains– general information about the SAS data set (such
as data set name and number of observations)– variable attributes (name, type, length, position,
informat, format, label).
• CONTENTS procedure displays the descriptor portion of a SAS data set.
Browsing the Descriptor Portion
63
• General form of the CONTENTS procedure:
•
• Example:
PROC CONTENTS DATA=SAS-data-set;RUN;
proc contents data=work.newclass;run;
Browsing the Descriptor Portion
64
The CONTENTS Procedure
Data Set Name WORK.NEWCLASS Observations 5Member Type DATA Variables 7Engine V9 Indexes 0Created Friday, March 18, 2005 04:02:40 PM Observation Length 64Last Modified Friday, March 18, 2005 04:02:40 PM Deleted Observations 0Protection Compressed NOData Set Type Sorted NOLabel
Alphabetic List of Variables and Attributes
# Variable Type Len Format Informat
7 BMI Num 82 baseage Num 8 BEST12. F12.3 gender Num 8 BEST12. F12.6 height Num 8 BEST12. F12.1 lname Char 15 $F15. $F15.4 race Num 8 BEST12. F12.5 weight Num 8 BEST12. F12.
The data portion of a SAS data set is a rectangular table of character and/or numeric data values.
Character values
SAS Data Sets: Data Portion
66
SAS Variable ValuesThere are two types of variables:Character contain any value: letters, numbers,
special characters, and blanks. Character values are stored with a length of 1 to 32,767bytes. One byte equals one character.
Numeric stored as floating point numbers in 8bytes of storage by default. Eight bytes of floating point storage provide space for 16 or 17 significant digits. You are not restricted to 8 digits.
12
67
SAS names • can be 32 characters long.• can be uppercase, lowercase, or mixed-
case.• must start with a letter or underscore.
Subsequent characters can be letters, underscores, or numeric digits.
SAS Data Set and Variable Names
68
• Select the valid default SAS names.
data5mon
5monthsdata
five months data
fivemonthsdata
data#5
Valid SAS Names
five_month_data
69
• Select the valid default SAS names.
data5mon
5monthsdata
five months data
fivemonthsdata
data#5
Valid SAS Names
five_month_data
70
• SAS stores date values as numeric values.• A SAS date value is stored as the number of
days between January 1, 1960, and a specific date.01JAN1959 01JAN1960 01JAN1961
store-365 0 366
display
01/01/1959 01/01/1960 01/01/1961
SAS Date Values
71
LastName FirstName JobTitle Salary
TORRES JAN Pilot 50000LANGKAMM SARAH Mechanic 80000SMITH MICHAEL Mechanic . WAGSCHAL NADJA Pilot 77500TOERMOEN JOCHEN 65000
A value must exist for every variable for each observation. Missing values are valid values.
A numeric missing value is displayed as a period.
A character missing value is displayed as a blank.
Missing Data Values
72
•The PRINT procedure displays the data portion of a SAS data set.
•By default, PROC PRINT displays– all observations– all variables– an Obs column on the left side.
SAS documentation and text in the SAS windowing environment use the following terms interchangeably:
SAS Data Set SAS Table
Variable Column
Observation Row
SAS Data Set Terminology
76
SAS statementsusually begin with an identifying keywordalways end with a semicolon.
DATA newclass; SET mylib.class;BMI=(weight*.454)/((height*.0254)**2);
PROC PRINT DATA=newclass;
PROC MEANS DATA=newclass;CLASS gender;VAR bmi;
RUN;
SAS Syntax Rules
77
SAS statementsusually begin with an identifying keywordalways end with a semicolon.
DATA newclass; SET mylib.class;BMI=(weight*.454)/((height*.0254)**2);RUN;
PROC PRINT DATA=newclass;RUN;
PROC MEANS DATA=newclass;CLASS gender;VAR bmi;RUN;
SAS Syntax Rules
78
SAS statements are free-format.One or more blanks or special characters can be used to separate words.They can begin and end in any column.A single statement can span multiple lines.Several statements can be on the same line.
Unconventional Spacing
...
DATA newclass; SET mylib.class;BMI=(weight*.454)/((height*.0254)**2);
PROC PRINT DATA=newclass;PROC MEANS DATA=newclass;CLASS gender; VAR bmi; RUN;
SAS Syntax Rules
14
79
Good spacing makes the program easier to read.
Conventional Spacing
DATA newclass; SET mylib.class;BMI=(weight*.454)/((height*.0254)**2);
PROC PRINT DATA=newclass;
PROC MEANS DATA=newclass;CLASS gender;VAR bmi;
RUN;
SAS Syntax Rules
80
Type * to begin a comment.Type your comment text.Type ; to end the comment.
* Create work.staff data set and add bmi;DATA newclass; SET mylib.class;
BMI=(weight*.454)/((height*.0254)**2);
* Produce listing report of newclass ;PROC PRINT DATA=newclass;RUN;
SAS Comments
Getting Familiar with SAS® Data Sets
SAS Data Libraries
83
Objectives
– Explain the concept of a SAS data library.– State the difference between a permanent
library and a temporary library.– Use the CONTENTS procedure to
investigate a SAS data library.
84
A SAS data library is a collection of SAS files that are recognized as a unit by SAS.
Directory-based A SAS data library isSystems a directory.
SAS Data Libraries
15
85
FILES
LIBRARIES
You can think of a SAS data library as a drawer in a filing cabinet and a SAS data set as one of the file folders in the drawer.
SAS Data Libraries
86
Regardless of which host operating system you use, you identify SAS data libraries by assigning each a library reference name (libref).
libref
Assigning a Libref
87
work
sasuser
ia
work - temporary library
sasuser - permanent library
When you invoke SAS, you automatically have access to a temporary and a permanent SAS data library.
You can create and access your own permanent libraries.
ia - permanent library
SAS Data Libraries
88
LIBNAME libref 'SAS-data-library' <options>;
Rules for naming a libref: must be 8 characters or lessmust begin with a letter or underscoreremaining characters are letters, numbers, or underscores.
Assigning a Libref• You can use the LIBNAME statement to assign a
libref to a SAS data library.• General form of the LIBNAME statement:
89
libname ia 'c:\workshop\winsas\prog1';
Assigning a Libref• Examples:
libname sasclass ‘d:\temp\sasclass';
Libref exists only during the current SAS session
90
• When you submit the LIBNAME statement, a connection is made between a libref in SAS and the physical location of files on your operating system.
Windows 'c:\workshop\winsas\prog1'
Making the Connection
16
91
The first name (libref) refers to the library.
Every SAS file has a two-level name:
The second name (filename) refers to the file in the library.
The data set ia.sales is a SAS file in the ia library.libref.filename
sasuser
work
ia
sales
Two-level SAS Filenames
92
work.employee employee
Temporary SAS Filename
• The libref work can be omitted when you refer to a file in the work library. The default libref is work if the libref is omitted.
files are deleted when SAS session ends
93 94
95 96
SAS Files
SAS data sets and other files are stored in SAS data libraries.
SASUSER
WORK
MYLIB
17
97
SAS Data Libraries•A SAS data library is a collection of SAS files that are recognized as a unit by SAS on your operating environment.
WORK - temporary library
SASUSER - permanent library
You can create and access your own permanent libraries.
mylib - permanent library
WORK
SASUSER
MYLIB
98
The LIBNAME Statement
• LIBNAME statement establishes the library reference (or libref), which is an alias for the SAS data library.
• global statement - in effect for the entire SAS session until replaced
• To create a SAS data set using a SAS data set as input, you must use aDATA statement to start a DATA step and
name the SAS data set being created (output data set: newclass2)
SET statement to identify the SAS data set being read (input data set: myib.class2).
Reading a SAS Data Set
To create a variable, you must use anassignment statement to uses the values of the variables Weight and Height and assign the result of the calculation to the variable BMI.
111
• By default, the SET statement reads all of the observations from the input SAS data set variables from the input SAS data set.
General form of a DATA step:
Reading a SAS Data Set
DATA output-SAS-data-set;SET input-SAS-data-set;additional SAS statements
RUN;
112
• An assignment statementevaluates an expressionassigns the resulting value to a variable.
General form of an assignment statement:
Assignment Statements
variable=expression;
113
Define the Variable
valuevalue
expressionnew_variable_name =
EVALUATE
ASSIGN
114
Name the New Variable
Rules for naming SAS variables:• 1 to 32 characters in length• start with a letter (A through Z) or an
underscore (_)• continue with any combination of
numbers, letters, or underscores• can be stored in mixed-case.
20
115
Operators are symbols that request arithmetic calculationsSAS functions.
Operands arevariable namesconstants.
An expression contains operands and operators that form a set of instructions that produce a value.
SAS Expressions
116
EXAMPLES
• x=3; assigns 3 to X for all observations
• y=age/10; assigns the value of age divided by 10 to each observation
• Clinic=’Boston’; assigns the character constant Boston to the variable clinic for each observation
• bmi= wgtkg/(htm**2); assigns results of calculationto new variable bmi
NOTE : X Y and bmi are numeric variables; clinic is a character variable
117
Selected operators for basic arithmetic calculations in an assignment statement:
Operator Action Example Priority
+ Addition Sum=x+y; III
- Subtraction Diff=x-y; III
* Multiplication Mult=x*y; II
/ Division Divide=x/y; II
** Exponentiation Raise=x**y; I
- Negative prefix Negative=-x; I
Using Operators
118
Lname $ 15
BaseageN 8
Bdate N 8
Race N 8
Weight N 8
HeightN 8
PDV
Compiling the DATA Step
libname mylib 'SAS-data-library';data newclass2;
set mylib.class2;BMI=(weight*.454)/((height*.0254)**2);run;
...
119
lname $ 15
baseage N 8
Gender N 8
Race N 8
Weight N 8
HeightN 8
BMI N 8
PDV
Compiling the DATA Step
libname mylib 'SAS-data-library';data newclass2;
set mylib.class2;BMI=(weight*.454)/((height*.0254)**2);