Top Banner
Handout: SAS Version: SAS/Handout/0608/1.0 Date: 30-06-08 Cognizant 500 Glen Pointe Center West Teaneck, NJ 07666 Ph: 201-801-0233 www.cognizant.com
173
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: SAS Handout 1.0

Handout: SAS Version: SAS/Handout/0608/1.0

Date: 30-06-08

Cognizant 500 Glen Pointe Center West

Teaneck, NJ 07666

Ph: 201-801-0233 www.cognizant.com

Page 2: SAS Handout 1.0

Handout - SAS

TABLE OF CONTENTS

Introduction ................................................................................................................................... 7 

About this Module ......................................................................................................................... 7 

Target Audience ........................................................................................................................... 7 

Module Objectives ........................................................................................................................ 7 

Pre-requisite ................................................................................................................................. 7 

Session 02: Introduction to SAS / Getting Started ..................................................................... 8 

Learning Objectives ...................................................................................................................... 8 

Introduction to SAS Programming Language ............................................................................... 8 

BASE SAS Software ..................................................................................................................... 9 

Why SAS? .................................................................................................................................... 9 

Multi Vendor Architecture (MVA) ................................................................................................ 10 

Applications ................................................................................................................................ 10 

Overview of SAS Products ......................................................................................................... 10 

Getting Started ............................................................................................................................ 12 

Steps of a SAS Program ............................................................................................................ 13 

DATA Step vs. PROC Step ........................................................................................................ 14 

Flow Diagram of a SAS Program ............................................................................................... 14 

Data types in SAS....................................................................................................................... 15 

Summary .................................................................................................................................... 15 

Test your Understanding ............................................................................................................ 15 

Session 03: Getting Started......................................................................................................... 16 

Learning Objectives .................................................................................................................... 16 

Missing Value Representation in SAS ........................................................................................ 16 

SAS Programming Rules ............................................................................................................ 17 

Rules for Creating Variable Names ............................................................................................ 17 

My First SAS Program ................................................................................................................ 17 

SAS Windowing Environment ..................................................................................................... 18 

Try It Out ..................................................................................................................................... 22 

Summary .................................................................................................................................... 22 

Test your Understanding ............................................................................................................ 23 

Session 04: Basic Concepts ....................................................................................................... 24 

Learning Objectives .................................................................................................................... 24 

Page 2 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 3: SAS Handout 1.0

Handout - SAS

_N_ & _ERROR_ ........................................................................................................................ 24 

Program Data Vector (PDV) ....................................................................................................... 24 

DATA Step's Built-in Observation Loop ...................................................................................... 27 

SAS Program Flow of Execution ................................................................................................ 27 

Reading from External File ......................................................................................................... 30 

Try It Out ..................................................................................................................................... 35 

Summary .................................................................................................................................... 36 

Test your Understanding ............................................................................................................ 36 

Session 05: Basic Concepts/Working with the DATA Step ..................................................... 37 

Learning Objectives .................................................................................................................... 37 

Variable Declaration ................................................................................................................... 37 

Reading same record more than once ....................................................................................... 38 

Scope of DATA and PROC Steps .............................................................................................. 39 

Operators in SAS ........................................................................................................................ 40 

Commenting in SAS ................................................................................................................... 42 

SAS Data Libraries ..................................................................................................................... 42 

Reading a SAS Dataset .............................................................................................................. 44 

Try It Out ..................................................................................................................................... 46 

Summary .................................................................................................................................... 47 

Test your Understanding ............................................................................................................ 48 

Session 07: Working with the DATA step .................................................................................. 49 

Learning Objectives .................................................................................................................... 49 

Dataset Options and Options Statement .................................................................................... 49 

SAS Informats & Formats ........................................................................................................... 50 

Working with SAS Date and Time .............................................................................................. 52 

Styles of input ............................................................................................................................. 54 

Writing to an external file ............................................................................................................ 56 

Try It Out ..................................................................................................................................... 58 

Summary .................................................................................................................................... 60 

Test your Understanding ............................................................................................................ 60 

Session 09: SAS Procedures ...................................................................................................... 61 

Learning Objectives .................................................................................................................... 61 

SAS Procedures ......................................................................................................................... 61 

PROC PRINT .............................................................................................................................. 61 

PROC CONTENTS..................................................................................................................... 63 

PROC SORT .............................................................................................................................. 65 

Page 3 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 4: SAS Handout 1.0

Handout - SAS

PROC FORMAT ......................................................................................................................... 65 

PROC DATASETS ..................................................................................................................... 66 

Try It Out ..................................................................................................................................... 69 

Summary .................................................................................................................................... 70 

Test your Understanding ............................................................................................................ 70 

Session 11: SAS Programming Concepts ................................................................................. 71 

Learning Objectives .................................................................................................................... 71 

Retaining Variable Values .......................................................................................................... 71 

Automatic Variables .................................................................................................................... 72 

Titles and Footnotes ................................................................................................................... 74 

Conditional Processing ............................................................................................................... 75 

Iterative Processing .................................................................................................................... 77 

Conditional Iterative Processing: ................................................................................................ 78 

Other Data Step statements ....................................................................................................... 80 

Try It Out ..................................................................................................................................... 81 

Summary .................................................................................................................................... 83 

Test your Understanding ............................................................................................................ 83 

Session 13: SAS Programming Concepts/Built-in Functions in SAS ..................................... 84 

Learning Objectives .................................................................................................................... 84 

SAS ODS .................................................................................................................................... 84 

Arrays in SAS ............................................................................................................................. 85 

Arithmetic Functions ................................................................................................................... 87 

String Functions .......................................................................................................................... 90 

Try It Out ..................................................................................................................................... 98 

Summary ..................................................................................................................................102 

Test your Understanding ..........................................................................................................102 

Session 16: Built-in Functions in SAS / Merging and Combining SAS Data Sets ...............104 

Learning Objectives ..................................................................................................................104 

Date Time Functions ................................................................................................................104 

Combining Vertically .................................................................................................................108 

Concatenating ...........................................................................................................................109 

Interleaving ...............................................................................................................................109 

Combining Horizontally .............................................................................................................110 

One-to-one reading ..................................................................................................................110 

One-to-one merging .................................................................................................................111 

Match merging ..........................................................................................................................111 

Page 4 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 5: SAS Handout 1.0

Handout - SAS

Updating ...................................................................................................................................112 

Performing JOINS in DATA Step ..............................................................................................113 

Try It Out ...................................................................................................................................114 

Summary ..................................................................................................................................117 

Test your Understanding ..........................................................................................................117 

Session 18: Statistical Procedures ...........................................................................................118 

Learning Objectives ..................................................................................................................118 

PROC FREQ ............................................................................................................................118 

Multi-Threaded Processing .......................................................................................................120 

PROC MEANS ..........................................................................................................................121 

PROC SUMMARY ....................................................................................................................124 

PROC REPORT .......................................................................................................................124 

Try It Out ...................................................................................................................................127 

Summary ..................................................................................................................................130 

Test your Understanding ..........................................................................................................130 

Session 20: PROC SQL ..............................................................................................................131 

Learning Objectives ..................................................................................................................131 

PROC SQL Basics ...................................................................................................................131 

The SELECT Statement and its Clauses .................................................................................132 

Creating Output Tables ............................................................................................................133 

Summarizing & Grouping Data .................................................................................................134 

Querying Multiple Tables ..........................................................................................................134 

Limiting no of rows to be read and displayed ...........................................................................135 

Using Operators in PROC SQL ................................................................................................135 

Calculated Values .....................................................................................................................136 

Enhancing Query Output ..........................................................................................................137 

CONCLUSION ..........................................................................................................................139 

Try It Out ...................................................................................................................................139 

Summary ..................................................................................................................................140 

Test your Understanding ..........................................................................................................141 

Session 22: Introduction to MACROS ......................................................................................142 

Learning Objectives ..................................................................................................................142 

SAS Macro ................................................................................................................................142 

Advantages of the SAS Macro Facility .....................................................................................142 

Macro variables ........................................................................................................................143 

Automatic and User defined macro variables ...........................................................................145 

Page 5 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 6: SAS Handout 1.0

Handout - SAS

Macro Processor and the flow of execution .............................................................................145 

Creating macro variables in run time ........................................................................................147 

Try It Out ...................................................................................................................................149 

Summary ..................................................................................................................................151 

Test your Understanding ..........................................................................................................151 

Session 23: Introduction to MACROS ......................................................................................152 

Learning Objectives ..................................................................................................................152 

Macro Programs .......................................................................................................................152 

Using Macro Parameters ..........................................................................................................153 

Scope of Macro variables .........................................................................................................154 

System Options ........................................................................................................................155 

Condition execution in Macro ...................................................................................................158 

Iterative processing in Macro ....................................................................................................159 

Built-in Macro Functions ...........................................................................................................159 

Try It Out ...................................................................................................................................161 

Summary ..................................................................................................................................162 

Test your Understanding ..........................................................................................................162 

Session 25: Help on SAS ...........................................................................................................163 

Learning Objectives ..................................................................................................................163 

Debugging SAS Programs .......................................................................................................163 

Creating Efficient SAS Codes ...................................................................................................166 

Summary ..................................................................................................................................171 

Test your Understanding ..........................................................................................................171 

References ..................................................................................................................................172 

Websites ...................................................................................................................................172 

Books ........................................................................................................................................172 

STUDENT NOTES: ......................................................................................................................173 

Page 6 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 7: SAS Handout 1.0

Handout - SAS

Introduction

About this Module

This handout document Introduces the SAS programming language Explains the basic concepts in BASE SAS Touches the advanced concepts in BASE SAS

Target Audience

Entry Level Trainees

Module Objectives

After completing this module, you will be able to: Explain the SAS language Describe the basic concepts in SAS Work with the DATA step Explain procedures in SAS Explain SAS programming concepts Describe built-in functions in SAS Work with SAS Data Sets Work with statistical procedures Work with PROC SQL Describe MACROS

Pre-requisite

The trainee needs to have basic knowledge in programming language

Page 7 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 8: SAS Handout 1.0

Handout - SAS

Session 02: Introduction to SAS / Getting Started

Learning Objectives

After completing this session, you will be able to: Describe SAS Programming Language Explain the Multi Vendor Architecture List the Applications of SAS List the different SAS Products Explain what is a SAS Dataset Explain the steps of a SAS Program Describe the Datatypes in SAS

Introduction to SAS Programming Language

The SAS system began as a software system for Data Analysis & statistical work. Since then, SAS has evolved and made its presence in diverse fields. Today, SAS Systems analysis tools range from simple statistics to specialized analysis for econometrics & forecasting, statistical design, computer performance evaluation, Operation Research and Clinical Data Management. SAS finds its highest application in the field of Data Warehousing & Data Mining. SAS used to stand for “Statistical Analysis System", now this acronym is not used and it is simply called as SAS. SAS was developed in the early 1970s by SAS Institute Inc., North Carolina. It is the most widely used statistical software. It is a very powerful tool for Data Warehousing and Data Mining. Also widely used in fields of banking, finance, drug development, clinical research, Pharmaceutical Industries, and so on. SAS provides a Complete Application Development Environment to cater to the four (Data centric) basic tasks:

ACCESS Data MANAGE Data ANALYZE Data PRESENT Data

Page 8 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 9: SAS Handout 1.0

Handout - SAS

The descriptions about the above tasks are given below You can Access data from almost any source and in any format. o You can read and write data from text file or CSV (comma-separated-values) file

to powerful database like Oracle, DB2, etc. Manage the contents of the data o SAS manages the contents of the data and stores them in a special form called

SAS Dataset. Perform different kind of analysis on the data o You can use the SAS programming language or the built-in programs

(Procedures) to perform different kind of analysis on the data. Present the analyzed reports in a variety of formats o Finally you can present the analyzed reports in a variety of formats including text

or graphical format Many software applications are either totally menu driven, or totally command driven (“enter a command -see the result”). Base SAS software is neither totally menu driven nor totally command driven. With Base SAS software, you use statements to write a series of instructions called a ‘SAS program’, which communicates with the SAS system. This module introduces Base SAS software programming concepts.

BASE SAS Software

The SAS system is an integrated system of software products and the core of the SAS System is BASE SAS software, which consists of

SAS language - a programming language that you use to manage your data SAS procedures - software tools for data analysis and reporting Macro facility - a tool for extending and customizing SAS software programs and for

reducing repetitive codes. Output Delivery System (ODS) - a system that delivers output in a variety of easy-to-

access formats, such as MS Word, MS Excel, PDF, HTML, SAS data sets, etc,. SAS windowing environment - an interactive, graphical user interface that enables you

to easily code, run and test your SAS programs.

Why SAS?

SAS System enables you to access data in almost any format no matter where or how they are physically stored.

Can access data stored on different data bases as well as data on different computers - through Engines.

Can use its data management facility to update, combine, rearrange, edit or subset data before analysis

Its power, flexibility & ease of use enable you to gain strategic control of all your data processing needs.

SAS System has a collection of ready-to-use programs called procedures for analyzing and presenting the data in a variety of formats according to the user’s requirement.

Page 9 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 10: SAS Handout 1.0

Handout - SAS

Also it has many statistical procedures for performing statistical analysis. It provides an exhaustive inventory of application development tools.

Multi Vendor Architecture (MVA)

MVA makes SAS Platform Independent. It facilitates applications that run on more than one computing environment. SAS applications work the same, look the same and produce the same results irrespective of your hardware or OS. This is possible because SAS System has a layered structure called Multi Vendor Architecture (MVA). This consists of a host specific component which is specifically written for each environment and the portable component which brings it a universal ‘feel’. You can develop SAS applications on one environment and run them in other environments without any changes.

Applications

Applications of SAS are diverse. Some of the fields where SAS finds its applications are given below,

Application in the field of Data Warehousing and Data Mining Widely used in Clinical research/trials in developing and testing of drugs. Also used in the fields of Banking, pharmaceuticals. Statistical and mathematical analysis Business forecasting and decision support Operations research and project management Report writing and graphics Applications development SAS Systems analysis tools range from simple statistics to specialized analysis For econometrics and forecasting, statistical design, and Operation Research.

Overview of SAS Products

SAS licenses many different products. And most of the products are integrated, so you don't have to convert datasets (data) or start up another program to use the other products. The following is a partial list of SAS products with brief descriptions. You must have Base SAS software installed on your system to run most of these products. Base SAS Base SAS software includes the DATA step programming for data access, data manipulation and reporting using simple statistical and utility procedures. Must be installed on your system to run most of the other SAS products. SAS/ACCESS Allows you access data used by other software packages. You can read and, in some cases, write data in their native formats without having to leave SAS. Most of the popular database software is supported, and each has its own SAS/ACCESS product.

Page 10 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 11: SAS Handout 1.0

Handout - SAS

SAS/AF Allows you to write your own interactive SAS applications. Applications written with SAS/AF software allow users quick-and-easy access to information without knowing the SAS language. SAS/ASSIST Is a menu-driven front end to SAS software. You make choices from menus, and SAS writes the program for you. Programs can be stored for later use. SAS/CONNECT Connects computers running SAS software. Data can be shared between the computers, and programs developed on one computer or operating environment can be transferred to another for processing. SAS Enterprise Guide Providing a graphical user interface to power SAS. This is a Windows only product, but can be sed to access SAS servers on other systems. SAS Enterprise Miner A data mining tool and it is a complete product in itself. It provides an easy-to-use front-end to the SEMMA (Sample, Explore, Modify, Model, Assess) process for business users. SAS/GRAPH Produces high-resolution plots, charts, and maps. SAS/MDDB Server Allows you to save data in multidimensional database (MDDB) formats for use with online analytical processing (OLAP) (otherwise known as slicing and dicing your data). SAS/STAT Statistical analysis with a number of procedures, providing statistical information such as analysis of variance, regression, multivariate analysis, and categorical data analysis. SAS/Warehouse Administrator Simplifies the creation and maintenance of data warehouses. SAS Enterprise Business Intelligence Server Includes both a suite of business intelligence (BI) tools and a platform to provide uniform access to data. The goal of this product is to compete with the popular reporting tools like Business Objects and Cognos. SAS® Business Intelligence gives you the information when you need it, in the format you need. The SAS Difference Other vendors provide business intelligence solely in the form of historical reports that give you hindsight but limited insight. SAS Business Intelligence allows you to understand the past, monitor the present and predict outcomes as you move your business ahead.

Page 11 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 12: SAS Handout 1.0

Handout - SAS

SAS/ETL (Extraction, Transformation and Loading): Extract, cleanse, transform, load and manage data from a single environment SAS provides integrated ETL capabilities that enable organizations to extract transform and load data from across the enterprise to create consistent, accurate information. SAS is a modular product. That is, it requires a number of modules to run, such as BASE SAS. However, after the BASE SAS module is installed, you have the choice to add whatever additional modules to add functionality to SAS. For example,

SAS/STAT module adds the capability for statistical analysis. SAS/GRAPH adds the capability for high-resolution graphics and so forth.

Getting Started

SAS Datasets SAS’ own way of storing the data Before you can analyse your data and produce a report with SAS software, the data must be in a special form the SAS system can understand. This form is called SAS data set. It consists of two portions:

Descriptor Information Data Values

Descriptor Information:

The Descriptor information describes the contents of the SAS dataset to the SAS system. It contains the information like:

Dataset name Date created/modified Version no of the SAS system No of variables & Observations Info about each variable Variable name/data type/ length/position within the dataset and etc.

Data Values:

The Data values or the Data portion contains the actual data that have been collected. The data is organized into a rectangular structure containing rows called observations and columns called variables. An observation is a collection of data values that usually relate to a single object. A variable is the set of data values that describe a given characteristic.

Page 12 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 13: SAS Handout 1.0

Handout - SAS

Steps of a SAS Program

SAS programs are constructed from two basic building blocks: DATA step PROC step

DATA Step:

DATA step reads data from any source. Using Data step you can read data from text or csv file to databases like Oracle, DB2, etc.

Combine existing SAS datasets in a DATA step You can transform and analyze the data Write programming statements to modify the data Finally you can write-out the processed data to a SAS Dataset or an external file

PROC Step:

PROC stands for Procedure step. The PROC step recognizes only SAS datasets and not other files. It takes a SAS dataset, analyze the data and generate results / reports. It can also produce the results in graphical form like Graphs / Charts The results can be written to an Output SAS Dataset as well.

There can be any number of DATA or PROC steps in a SAS program A typical program starts with DATA step to create a SAS data set and then passes the dataset to a PROC step for processing. Here is a simple program that converts miles to kilometers in a DATA step and prints the results with a PROC step:

Page 13 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 14: SAS Handout 1.0

Handout - SAS

DATA Step vs. PROC Step

The following table differentiates a DATA step from a PROC step:

DATA STEP PROC STEP

Start with the keyword DATA Start with the keyword PROC

Ends with RUN Ends with RUN

Read and modify data Perform specific analysis or function

I/P: Data from any source I/P: Only SAS Datasets

O/P: SAS Dataset/file O/P: Reports / SAS Dataset

Flow Diagram of a SAS Program

RAW data is given an input to the SAS DATA step DATA step reads the data using SAS statements and creates a SAS Dataset as

output The created SAS Dataset is given as input to the SAS Procedure step The PROC step generate the Reports

Page 14 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 15: SAS Handout 1.0

Handout - SAS

Data types in SAS

There are only two data types available in SAS NUMERIC: By default a variable is considered as Numeric CHARACTER: Character variable should be followed by a $ symbol

The default length of Character and Numeric variables is 8 bytes

Summary

SAS provides a Complete Application Development Environment to cater to the four basic tasks: ACCESS, MANAGE, ANALYZE & PRESENT Data

The SAS system is an integrated system of software products and the core of the SAS System is BASE SAS software.

MVA makes SAS Platform Independent. It facilitates applications that run on more than one computing environment.

SAS is used in almost all the fields. SAS licenses many different products. And most of the products are integrated, so you

don't have to convert datasets (data) or start up another program to use the other products.

Base SAS is the core software. SAS Datasets is SAS’ own way of storing the data. SAS Datasets consists of two portions: Descriptor Information & Data Values. SAS programs are constructed from two basic building blocks DATA step & PROC

step. There are only two data types available in SAS, Numeric & Character

Test your Understanding

1. List down some of the fields where SAS is used. 2. What are the two portions of a SAS Dataset? 3. What are the steps of a SAS program? 4. Does SAS have a data type for storing Date values? 5. List down some of the SAS products. 6. What is MVA and what is its purpose?

Page 15 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 16: SAS Handout 1.0

Handout - SAS

Session 03: Getting Started

Learning Objectives

After completing this session, you will be able to: Explain missing value representation in SAS Explain SAS Programming Rules Code your first SAS program Describe SAS Windowing Environment

Missing Value Representation in SAS

Missing values are nothing but NULL values. Missing values are assigned to a variable when it is not populated or when the user tried to assign a character value to a numeric variable

Character missing value is represented by spaces Numeric missing value is represented by a period (.)

In the following dataset the value of Salary is missing in the 3rd observation (period - numeric missing value) and the value of Name is missing in the 4th observation (spaces - character missing value)

Dataset Name: WORK.EMP No of Observation : 4 Date Created : 03/25/2008 No of Variables : 4 Date Modified: 03/25/2008 Sorted : NO

EMPID NAME GENDER SALARY

111 RAMESH M 1000

222 KUMAR M 2000

333 SHANTHI F .

444 F 4000

Numeric missing value

Character missing value

Page 16 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 17: SAS Handout 1.0

Handout - SAS

SAS Programming Rules

The rules for writing SAS programs are listed below. Every SAS statement must end with a semicolon (;) SAS statements are not case-sensitive (can be in be in Upper or Lower case) SAS is a free-formatted language o A SAS statement may begin in any column o Several SAS statements may appear on the same line o A SAS statement can flow-over multiple lines, i.e., you can begin a statement on

one line and continue it on another line SAS Keywords can be used as variable names.

Example DATA Data; Run = 1; Run;

SAS is intelligent enough to differentiate a variable from a keyword Declaration of variable is not required, but it is always a good practice to declare the

variables.

Rules for Creating Variable Names

The rules for naming SAS data sets and variables are the same. must be 1 to 32 characters in length must start with a letter (A-Z) or an underscore (_) Can continue with any combination of numbers, letters, and underscores. No other special characters are allowed except underscore. Default data lengths of Character and Numeric variables are 8 bytes. A character variable can hold up to 32,767 characters of data.

My First SAS Program

DATA EMP; INPUT EMPID NAME $ SAL ; OUTPUT EMP; DATALINES; 111 RAMESH 1000 222 KUMAR 2000 333 RANI 3000 ; RUN; PROC PRINT DATA = EMP; RUN;

Page 17 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 18: SAS Handout 1.0

Handout - SAS

Explanation: INPUT Statement:

The INPUT statement reads data lines (observation) and assigns values to the SAS variables that correspond to the data fields.

Since Name is a character variable it is followed by $. DATALINES statement:

Use the DATALINES statement with an INPUT statement to read data entered in the program rather than from an external file.

The DATALINES statement indicates the end of the DATA step and the beginning of the input data values.

DATALINES assumes that the data follows immediately, that is, the data is 'instream' or within the program.

You can also use CARDS statement instead of DATALINES. The functionality of both statements are same.

Guidelines:

1. Must be the last statement in the DATA step (that is, place the DATALINES statement directly before the first data line.) When the compiler comes across the statement DATALINES; then it reads subsequent lines as data rather than source code.

2. Terminate the data with a semicolon in a new line. OUTPUT Statement Writes the value of the variables EMPID, NAME and SAL to the Dataset EMP PROC PRINT Procedure PRINT procedure prints the contents (data portion) of the SAS dataset

SAS Windowing Environment

SAS is designed to be easy to use. It provides windows for accomplishing all the basic SAS tasks we need to do. When you first start SAS, five main SAS windows will be opened

Program Editor or Editor Log window Output window Explorer Results

Program Editor

We can use program editor window to enter, edit, and submit SAS programs SAS color codes different parts of the program. Extension of the SAS program is .sas

Page 18 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 19: SAS Handout 1.0

Handout - SAS

Log window

The Log window displays: Messages about the SAS session How the SAS program was executed Notes, errors, or warnings thrown during the execution of a SAS program Time taken by SAS system to process the program Extension of the log file is .log

Output window

The Output window displays the output of the SAS programs that we submit. Extension of the output file is .lst

Page 19 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 20: SAS Handout 1.0

Handout - SAS

If we create HTML output, it will be opened in Results Viewer window, which is the internal browser for SAS.

Explorer window

Explorer Window gives easy access to the SAS files and libraries. Use this window to: View and manage SAS files create new SAS libraries and SAS files open any SAS file perform most file management tasks such as moving, copying, and deleting files Create file shortcuts

Page 20 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 21: SAS Handout 1.0

Handout - SAS

Results window:

Table of contents for your Output window. The result tree lists each part of your results in an outline form. It helps us to navigate and manage output from SAS programs that we submit. We can view, save, and print individual items of output.

Page 21 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 22: SAS Handout 1.0

Handout - SAS

Try It Out

Problem Statement

Write a program to read the raw data stored as Instream data and create a SAS System data set called ALL that contains the following variables, in the order listed: ID, HR (heart rate), SBP (systolic blood pressure), and DBP (diastolic blood pressure). Input Data: A1 68 130 80 B3 101 148 86 C2 . . 72 D1 72 140 88

Code

DATA ALL; INPUT ID $ HR SBP DBP; OUTPUT ALL; DATALINES; A1 68 130 80 B3 101 148 86 C2 . . 72 D1 72 140 88 ; RUN;

Refer File Name: 3.1.sas to obtain soft copy of the program code

How It Works

INPUT statement reads the values of ID, HR, SBP & DBP respectively and stores it in the PDV.

OUTPUT statement writes the values of the user-defined variables to the dataset.

Summary

Character missing value is represented by spaces Numeric missing value is represented by a period SAS Programming Rules Rules for creating Variable Names SAS is designed to be easy to use. It provides windows for accomplishing all the basic

SAS tasks we need to do.

Page 22 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 23: SAS Handout 1.0

Handout - SAS

Test your Understanding

1. How do you read in the variables that you need? 2. How are numeric and character missing values represented internally?

Page 23 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 24: SAS Handout 1.0

Handout - SAS

Session 04: Basic Concepts

Learning Objectives

After completing this session, you will be able to: Explain the automatic variables Describe the PDV concept Explain DATA Step's built-in Observation loop Describe SAS Program flow of execution Read from external file

_N_ & _ERROR_

During the Data Step execution, SAS creates two automatic variables _N_ & _ERROR_

_N_:

The _N_ variable counts the number of times the DATA step begins to iterate. Initially it is set to 1 and for each iteration it is incremented by 1. It behaves like a record counter, that is, while reading the first record _N_ is set to 1, while reading the nth record _N_ is set to ‘n’ and so on.

DATA step's Iteration No _N_ value

1 1

2 2

10 10

n n

_ERROR_:

The _ERROR_ variable signals the occurrence of an error caused by the data during execution. By default it is set to 0, if any error occurs, this is set to 1. For Example: If any data error occurs, _ERROR_ is set to 1. Data error occurs when you try to assign a character value in a numeric variable.

Program Data Vector (PDV)

PDV is a temporary memory area where the values of variables are stored during execution time. It contains all the variables created in the Data step statements and the two automatic variables _N_ & _ERROR_.

Page 24 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 25: SAS Handout 1.0

Handout - SAS

Initially all the values of variables are set to missing except for _N_ & _ERROR_. _N_ is set to 1 and _ERROR_ is set to 0 initially. All variables are marked as either KEEP or DROP. The automatic variables _N_ & _ERROR_ are always dropped, so they will not be written to the output dataset. When the program encounters the Output statement or when the scope of the data set is reached:

All values in PDV, except those marked to be dropped, are written as a single observation to the output data set.

System returns to the Data statement to begin the next iteration All the variables are reset to missing except _N_ & _ERROR_. _N_ is incremented by 1.

Input Buffer: If the program is reading from an external source, then SAS creates a temporary buffer space called Input buffer. Input buffer holds the current record being processed. Its default length is 256 characters and can be changed using the option LRECL. Understanding the PDV: Consider the below program DATA EMP; INPUT EMPID NAME $ SAL ; NEWSAL = SAL + 100; OUTPUT EMP; DATALINES; 111 RAMESH 1000 222 KUMAR 2000 333 RANI 3000 ; RUN;

When the above program is submitted, SAS allocates memory for Input buffer and PDV.

Page 25 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 26: SAS Handout 1.0

Handout - SAS

Step 1: During the program execution, SAS reads the first record and stores it in the Input

buffer. The Input pointer is positioned at the beginning of the Input buffer. The following figure shows the position of the input pointer in the input buffer before

SAS reads the data.

Step 2:

The INPUT statement then reads data values from the record in the input buffer and writes them to the PDV where they become variable values.

After reading the first value the input pointer moves to the beginning of next value in the input buffer, from there the INPUT statement reads the value for second variable and so on.

The below figure illustrates the process.

Step 3:

After the INPUT statement reads a value for each variable, the next statement is executed.

SAS computes a value for the variable NEWSAL from SAL and writes it to the PDV. All the programming statements read and write the values of variables from the PDV.

Page 26 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 27: SAS Handout 1.0

Handout - SAS

Step 4: When SAS encounters the OUTPUT statement or when it executes the last statement in the current DATA step, all the values in the PDV except those marked as DROP are written as single observation to the dataset EMP. Step 5:

Before reading the next record all the variables are set to missing except the automatic variables _N_ and _ERROR_.

_N_ is incremented by 1, so it becomes 2 _ERROR_ is 0, since no error was occurred.

DATA Step's Built-in Observation Loop

The code inside the data step is repeated to read from multiple records. The iteration continues until it reaches the End of File. At the end of every iteration of the observation loop,

Values of all variables in PDV are written to the Dataset. Control returns to the top of the DATA step. Next iteration proceeds

At the beginning of every iteration of the observation loops:

Values of all variables in PDV are set to missing except automatic variables _N_ is incremented by one _ERROR_ is reset to 0

SAS Program Flow of Execution

The SAS System processes the DATA step in two phases: Compilation phase Execution phase

Page 27 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 28: SAS Handout 1.0

Handout - SAS

Compilation:

During the compilation phase:

SAS checks the syntax of the SAS statements Establishes an area of memory called input buffer, if reading an external source/file. It allocates the memory for Program Data Vector (PDV) Assigns required attributes to variables like, it’s data type, length, position, etc., Builds the descriptor portion of the new dataset. Converts SAS Code into uppercase.

Execution:

Page 28 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 29: SAS Handout 1.0

Handout - SAS

The DATA step begins with a DATA statement. Each time the DATA statement executes, a new iteration of the DATA step begins, and the _N_ automatic variable is incremented by 1.

SAS sets the variables to missing in the program data vector (PDV). SAS reads a data record from a raw data file into the input buffer and then stores in

the PDV. SAS executes any subsequent programming statements for the current record. When it encounters a OUTPUT statement or at the end of the DATA step, SAS writes

an observation to the SAS data set The system automatically returns to the top of the DATA step. The same steps continue until there is no record to be read

Control flow in DATA step:

During the compilation time, SAS builds the descriptor portion of the Dataset EMP. At the beginning of the execution:

SAS reads the first observation from the raw file. The observation passes through every observation in the DATA step. When SAS encounters an Output statement or when the scope of the DATA step is

reached the values of the variables are written to the Dataset EMP as observation one.

When it reaches the RUN statement, the control goes back to the beginning of the DATA step for reading the subsequent observations.

SAS reinitializes the PDV Now it checks whether a record is available to read. Since it is available SAS reads the second record and the record follows the same

step as mentioned above. Similarly SAS reads the third record and write it to the Dataset. Then SAS checks the availability of the next record in the Input file. Since it is not available, SAS terminates the current DATA step and the control comes

after the DATA step for executing the other steps.

Page 29 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 30: SAS Handout 1.0

Handout - SAS

Reading from External File

INFILE Statement

Purpose: Identifies an external (raw data) file. An INFILE statement is used to specify the source of data read by the INPUT statement. If this statement is omitted then SAS considers it as in-stream data and reads the data from the DATALINES. General form: INFILE 'raw-data-file' <options>;

Where, raw-data-file - points to the raw data file being read options - affect how SAS reads the raw data file Example: DATA TEMP; INFILE ‘C:\SAS\SASFILES\FILE.TXT’; INPUT NAME $ SAL; RUN;

Instead of hard-coding full path of the raw-data-file, we can create a file reference to the Input file using FILENAME statement

FILENAME statement

General form: FILENAME fileref 'filename';

Where, fileref is a name that you associate with an external file. It creates a logical link (short-cut) to the filename. Example: FILENAME INP ‘C:\SAS\SASFILES\FILE.TXT’; DATA TEMP; INFILE INP; INPUT NAME $ SAL; RUN;

The following options of the INFILE statement affect how the data is read from the external file:

Page 30 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 31: SAS Handout 1.0

Handout - SAS

LRECL (Logical RECord Length):

This specifies the maximum length of the record in the data file Its value changes the length of the Input buffer Its default length is 256 bytes

Consider your input records maximum length is 1000 characters and you are not using the LRECL option, then SAS reads only the first 256 characters of data and omits the remaining data. So you have to change the value of LRECL to 1000 like, LRECL = 1000

DLM option (Delimiter)

This option is useful when the values of variables in the input files are separated by a delimiter other than blank. Example: Below data values are separated by comma (,). So ‘,’ is the delimiter. The DLM option should look like: DLM = ‘,’

111,RAMESH,1000 222,KUMAR,2000

The DLM option in INFILE statement should look like INFILE INP DLM = ‘,’ ;

DSD (Delimiter Sensitive Data)

The DSD option sets the default delimiter to comma treats consecutive delimiters as missing values Ex: If DSD option is used while reading the below record, the value of Name is

considered as missing.

111, ,1000

Enables SAS to read values with embedded delimiters if the value is surrounded by double quotes.

Example: Consider the value for Name in the first observation is, RAMESH,KUMAR i.e. with embedded delimiter (,) When the following dataline: 111,RAMESH, KUMAR,1000

Page 31 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 32: SAS Handout 1.0

Handout - SAS

Is read with the INPUT statement, then the value of variables would be, EMPID = 111 NAME = RAMESH SAL = . (missing value)

Because SAS considers the ‘,’ in the Name RAMESH,KUMAR as a delimiter and takes RAMESH as the value for NAME KUMAR as the value for SAL. Since SAL is a numeric variable it assigns a missing value to it. If DSD option is used and the data value containing embedded delimiters is enclosed within double-quotes, SAS will consider “RAMESH, KUMAR” as the value for NAME and before storing the value it removes the enclosed quotes. Example: 111, “RAMESH, KUMAR”,1000 EMPID = 111 NAME = RAMESH, KUMAR SAL = 1000

END=

The END= option creates and names a temporary variable that acts as an end-of-file indicator. General Form: INFILE ‘fileref’ END=variable; The temporary variable is set to 1 only when the INFILE statement reads the last observation from the input file. For all other observations it is set to 0

RECFM

Specifies the record format of the external file. Usually, the SAS System reads a line of data until a carriage return is encountered. However, sometimes more than one fixed-length record (records with same LRECL) occurs in a single line without carriage return characters. In this case, the option RECFM=F (fixed) needs to be specified to read the data. The default value is RECFM=V (variable) and it considers one record per line.

FLOWOVER / MISSOVER / TRUNCOVER

Consider the Data below,

Page 32 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 33: SAS Handout 1.0

Handout - SAS

Here,

Each line should contain 4 data values Last and First names, Employee ID and Job Code. The grayed-out area denotes actual line lengths.

Program: DATA Test; INFILE "d:\infile\emplist.dat" <OPTIONS>; INPUT lastn $1-21 Firstn $ 22-31 Empid $32-36 Jobcode $37-45; RUN;

The code was submitted using different options on the INFILE statement.

FLOWOVER

Causes the INPUT statement to jump to the next record if it doesn’t find values for all variables in the current record/line.

This is the default option. Program: DATA Test; INFILE "d:\infile\emplist.dat" FLOWOVER; INPUT lastn $1-21 Firstn $ 22-31 Empid $32-36 Jobcode $37-45; RUN;

Contents of the Dataset Test (using FLOWOVER):

Page 33 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 34: SAS Handout 1.0

Handout - SAS

The INPUT statement is expecting the data for Jobcode in the positions 37-45, but the datavalue in the 2nd record is only till column 41.So the data is considered as incomplete and the INPUT statement goes to the next record and takes ‘SMITH’ as the value for Jobcode.

MISSOVER

If SAS reaches the end of the line without finding values for all fields, variables without values are set to missing. Program: DATA Test; INFILE "d:\infile\emplist.dat" MISSOVER; INPUT lastn $1-21 Firstn $ 22-31 Empid $32-36 Jobcode $37-45; RUN;

Contents of the Dataset Test (using MISSOVER):

The value of Job code in the 2nd record is only 5 chars, but the program is expecting 9 chars. Since the value of Jobcode in the 2nd record is incomplete, SAS assigns a missing value.

TRUNCOVER

This option acts similar to the MISSOVER Also it takes partial values to fill the first unfilled variable.

Program: DATA Test; INFILE "d:\infile\emplist.dat" TRUNCOVER; INPUT lastn $1-21 Firstn $ 22-31 Empid $32-36 Jobcode $37-45; RUN;

Page 34 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 35: SAS Handout 1.0

Handout - SAS

Contents of the Dataset Test (using TRUNCOVER):

With the TRUCOVER option is place, SAS reads all the columns and all the Observations correctly. The value of Job code in the 2nd record is only 5 chars, but the program is expecting 9 chars. So the data is incomplete. In this case:

MISSOVER assigns a missing value to Jobcode TRUNCOVER assigns partial value(only 5 chars) to the unfilled variable Jobcode

Try It Out

Problem Statement

Write a program to read the raw data stored in an external file C:\SASFILES\VITAL.CSV, and create a SAS System data set called ALL. VITAL contains the following variables, in the order listed: ID, HR (heart rate), SBP (systolic blood pressure), and DBP (diastolic blood pressure). Create a file reference to the external file. Contents of external file VITAL: A1,68,130,80 B3,101, 148,86 C2,.,.,72 D1, 72, 140 , 88

Page 35 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 36: SAS Handout 1.0

Handout - SAS

Code

FILENAME INP ‘C:\SASFILES\VITAL.CSV’ ; DATA ALL; INFILE INP DLM=','; INPUT ID $ HR SBP DBP; OUTPUT ALL; RUN;

Refer File Name: 4.1.sas to obtain soft copy of the program code

How It Works

This is similar to the problem 3.1 It uses a FILENAME statement that creates file reference to the external file

VITAL.CSV Since the input file is a csv (comma-separated-value) file, we are using the DLM

option.

Summary

_N_ & _ERROR_ are automatic variables created by SAS during program execution. PDV is a temporary memory area where the values of variables are stored during

execution time The code inside the data step is repeated to read from multiple records. The iteration

continues until it reaches the End of File. The SAS System processes the DATA step in two phases, compilation & execution

phase An INFILE statement is used to specify the source of data read by the INPUT

statement. The options of the INFILE statement affect how the data is read from the external file.

Test your Understanding

1. What SAS statements would you code to read an external raw data file to a DATA step? 2. Are you familiar with special input delimiters? How are they used? 3. If reading a variable length file with fixed input, how would you prevent SAS from reading

the next record if the last variable didn't have a value? 4. What is the Program Data Vector (PDV)? What are its functions? 5. At compile time when a SAS data set is read, what items are created? 6. What is _n_? 7. What does the RUN statement do? 8. Why SAS is considered self-documenting? 9. What is the purpose of _error_?

Page 36 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 37: SAS Handout 1.0

Handout - SAS

Session 05: Basic Concepts/Working with the DATA Step

Learning Objectives

After completing this session, you will be able to: Declare a variable in SAS Read same record more than once Describe the scope of DATA and PROC Steps Explain Operators in SAS Explain Commenting in SAS Explain SAS Data Libraries Read SAS Datasets

Variable Declaration

Variables can be declared using any of the following statements: LENGTH ATTRIB

LENGTH:

If not specified, SAS assigns a default length of 8 bytes to Character and Numeric Variables. Using LENGTH statement, we can explicitly assign the length and data type of variables. General Form: LENGTH variable-name <$> length-specification ...;

Example: Length Name $20 Age;

ATTRIB:

Using ATTRIB statement, we can associate the following attributes to variables in a single statement.

Length & Data type Label Informat Format

General Form: ATTRIB variable-name attributes;

Page 37 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 38: SAS Handout 1.0

Handout - SAS

Attributes: LENGTH=<$>length

specifies the length of variable. $ indicates, it is a character variable.

LABEL='label'

Associates a label with a variable. INFORMAT=informat

associates an informat with a variable FORMAT=format

associates a format with a variable Example: ATTRIB Name length=$20 label='Name of Employee‘ ;

Note: Labels, Informat, Format are discussed later

Reading same record more than once

By default each INPUT statement reads a separate record from the Input file.

Single Trailing @

The single trailing @ option holds a raw data record in the input buffer until, SAS executes an INPUT statement with no trailing @ or it reaches the bottom of the DATA step.

The Double Trailing @

The double trailing @ holds the raw data record across iterations of the DATA step until the end of the line is reached.

Page 38 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 39: SAS Handout 1.0

Handout - SAS

Single Trailing @ Versus Double Trailing @

The table below lists the difference between Single Trailing @ Versus Double Trailing @

Scope of DATA and PROC Steps

Scope of a DATA step begins with the keyword DATA and ends with one of the following: Keyword RUN Beginning of another data step (DATA keyword). Beginning of another proc step (PROC keyword). End of program Keyword ENDSAS ENDSAS terminates the current SAS program or session. CARDS or CARDS4 statement DATALINES or DATALINES4 statement DATALINES4 / CARDS4 is used to read input values when the data contain

semicolons. Implicit Output Statement: By default, every DATA step contains an implicit OUTPUT statement at the end. This tells the SAS System to write the current observation to the data set at the end of every iteration.

Page 39 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 40: SAS Handout 1.0

Handout - SAS

Presence of an explicit OUTPUT statement turns-off the implicit one. Fig 1: Observations are written to both the datasets DATA1 and DATA2 Fig 2: Since an explicit OUTPUT statement is present, observations are not written to the dataset DATA2.

Operators in SAS

Operators in SAS are classified into Arithmetic Operators Comparison Operators Logical Operators

Arithmetic Operators

Page 40 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 41: SAS Handout 1.0

Handout - SAS

Comparison Operators

** – Examples of using the IN operator if VNUM in (1,20,55,79,100,500) then TRUE, if the value of the variable VNUM is found in the given list.

Logical Operators

Other Operators: || --- Concatenation > < --- Minimum < > --- Maximum Concatenation (||) Operator: To concatenate two character strings. Ex: name=‘Jacob’ || ‘son’ MIN (> <) and MAX (< >) Operator: To find the minimum or maximum of two values Ex: x=a >< b; /* x returns minimum of a & b */ x=a <> b; /* x returns maximum of a & b */

Page 41 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 42: SAS Handout 1.0

Handout - SAS

Commenting in SAS

There are two styles of commenting available in SAS Multi-line commenting Single line commenting

Multi-line commenting:

For multi-line commenting, enclose the comments in between /* and */ Example: /* INPUT NAME $ SAL ; SAL = SAL + 1000; */

Single line commenting:

For single line commenting, include the comments in between an asterisk and a semicolon. Example: *SAL = SAL + 1000;

SAS Data Libraries

SAS data library is a collection of one or more SAS data files. It is simply a directory or folder, where we can store SAS Data sets and other SAS files. You can think of a SAS data library as a drawer in a filing cabinet and each data set as a file folder in the drawer.

General form of a SAS Data set: SAS Dataset name is of two levels <Library Name>.Dataset Name

Where:

Library Name represents where the dataset is stored. Library Name is optional; if you omit the library name, then the dataset is stored in the

default library WORK.

Page 42 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 43: SAS Handout 1.0

Handout - SAS

WORK is a system defined temporary library. All the datasets stored in WORK library will be deleted at the end of the session. If you want to create a permanent dataset, it needs to be created in a user-defined SAS library. Example: The dataset Admit is stored in the user-defined library Clinic.

LIBNAME Statement

This statement is used to create a user-defined library. General Form: LIBNAME libref 'SAS-data-library'; Where,

SAS-data-library - is the path of a directory in a secondary storage device in which, SAS data files are stored.

libref - represents a library reference to the above mentioned directory. It creates a logical link (short-cut) to the SAS-data-library

Example: LIBNAME ABC “C:\SAS\SASFILES”;

Just as you assign a fileref by using a FILENAME statement, you assign a libref by using a LIBNAME statement. Filerefs perform the same function as librefs: they temporarily point to a storage location for data. However, librefs reference SAS data libraries, whereas filerefs reference external files.

Page 43 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 44: SAS Handout 1.0

Handout - SAS

Reading a SAS Dataset

SET Statement:

SET statement is used to read an existing dataset. General Form: SET SAS-data-set <options>;

The SET statement points to the SAS dataset(s) to be read. Options in the SET statement affect how the data is read.

KEEP / DROP:

By default, SAS will write all variables and observations to the output dataset. Using the Dataset options KEEP & DROP, you can make SAS to write only specific variables or observations to the Output Dataset. KEEP: The KEEP option names variables you want to read from a dataset. Example: SET EMP (KEEP = ID NAME);

This statement reads only the variables ID & NAME from the dataset EMP. DROP: Names variables you want to omit from the dataset. Example: SET EMP (DROP = SAL);

This statement reads all the variables except SAL from the dataset EMP. You can also use the KEEP/DROP option in the Output Dataset also. Example: DATA ALL (KEEP = ID NAME); SET EMP; RUN;

The below tables shows the difference between using the KEEP/DROP options in Input Dataset versus Output Dataset.

Page 44 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 45: SAS Handout 1.0

Handout - SAS

FIRSTOBS and OBS

Use FIRSTOBS=n to cause processing to begin at the nth observation. Use OBS=n to cause processing to stop at the nth observation. Default value: FIRSTOBS = 1 OBS = MAX

MAX points to the last observation in a dataset. Example: DATA ALL; SET EMP FIRSTOBS = 100 OBS = 300; RUN;

There will be 201 observations read from the dataset EMP. Alternative approach: You can also achieve the same result of FIRSTOBS & OBS using _N_ with an IF condition Example: DATA ALL; SET EMP; IF _N_ >= 100 AND _N_ <= 300 THEN OUTPUT ALL; RUN;

Which method is efficient and why? END =: We can also use the “END= <variable>” option with the SET statement. WHERE Statement: To filter the observations from the Input Dataset:

Page 45 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 46: SAS Handout 1.0

Handout - SAS

Example: SET EMP; WHERE SAL > 10000;

The above code reads only those records whose SAL value is > 10000.

Try It Out

Problem Statement 1

Write a program to read the raw file and create a dataset named EMP. Note that the data values are in two different layouts and Rectyp @1 specifies the layout type. If Rectyp = A - Name ID Sal If Rectyp = B - ID Sal Name Verify whether all the data values were read correctly by printing the contents of the dataset EMP. A ABISHEK 12345 10000 B 67890 20000 DAVID A KANNAN 12367 30000 B 67456 40000 KUMAR

Code

DATA EMP; INFILE INP; INPUT RECTYP $ @; IF RECTYP = ‘A’ THEN DO; INPUT NAME $ ID SAL ; END; ELSE DO; INPUT ID SAL NAME $ ; END; RUN; PROC PRINT DATA = EMP; RUN;

Refer File Name: 5.1.sas to obtain soft copy of the program code

Page 46 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 47: SAS Handout 1.0

Handout - SAS

Problem Statement 2

Read the dataset EMP (created in the problem 5.1) into a new dataset EMPNEW and store it in a user-defined library LIB. Also follow the below conditions.

1. Read only the observations whose Salary value is >= 20000 2. Drop the variable Rectyp

Verify the contents of the new dataset.

Code

LIBNAME LIB ‘C:\SASFILES\’; DATA LIB.EMPNEW; SET EMP (DROP = RECTYP); WHERE SAL >= 20000; RUN; PROC PRINT DATA = EMP; RUN;

Refer File Name: 5.2.sas to obtain soft copy of the program code

How It Works

LIBNAME statement creates a permanent library named LIB. DROP option drops the variable RECTYP while reading WHERE statement selects only the observations whose SAL >= 2000

Summary

Variables can be declared using LENGTH or ATTRIB statements Single Trailing @ holds the current record for the next INPUT statement Holds the record until all the values are read from the current record Every DATA step has an implicit OUTPUT statement at the end. Implicit OUTPUT statement will not work if an OUTPUT statement is present. Operators in SAS are classified into Arithmetic, Comparison, Logical and Other

operators There are two styles of commenting available in SAS SAS data library is a collection of one or more SAS data files. SET statement is used to read an existing dataset. Options in the SET statement affect how the data is read.

Page 47 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 48: SAS Handout 1.0

Handout - SAS

Test your Understanding

1. What is the purpose of the trailing @ and the @@? How would you use them? 2. If you have a data set that contains 100 variables, but you need only five of those, what is

the code to force SAS to use only those variables? 3. How do you control the number of observations and/or variables read or written? 4. How would you create multiple observations from a single observation?

Page 48 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 49: SAS Handout 1.0

Handout - SAS

Session 07: Working with the DATA step

Learning Objectives

After completing this session, you will be able to: Familiarize with the Dataset Options and Options statement Work with SAS Informats and Formats Work with SAS Date and Time Explain the Styles of input Write to an external file

Dataset Options and Options Statement

Options Statement:

It is used to change SAS system options. The change(s) will remain in effect for the rest of the job/session or until changed again. The OPTIONS statement can appear at any place in a program, except within datalines or cards statements. General form: OPTIONS <options>;

Where, options specifies one or more system options to be changed. Using the options statement you can also control the appearance of the output

NUMBER | NONUMBER and DATE | NODATE

The following OPTIONS statement suppresses the printing of both ‘page numbers’ and the ‘date and time’ in the output window. Example: OPTIONS NONUMBER NODATE ;

Page 49 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 50: SAS Handout 1.0

Handout - SAS

PAGESIZE=

The PAGESIZE= option specifies how many lines each page of output should contain.

LINESIZE=

The LINESIZE= option specifies how many characters each output line should contain

FIRSTOBS & OBS

We can also use the FIRSTOBS & OBS options with OPTIONS statement. Example: OPTIONS OBS = 100;

All the datasets created after this statement will contain only 100 observations

PAGENO=

By default, page no start at 1 and are numbered sequentially throughout the SAS session. If you want to reset the page no or to start the page no with any other no, use this option. In the following example the output pages are numbered sequentially beginning with number 3 Example: OPTIONS PAGENO=3;

SAS Informats & Formats

SAS Informat

Informat is used to read data in non-standard form. An Informat is a pattern / instruction that SAS uses to read data values into a variable.SAS interprets the format and converts it into standard character or numeric value. General form:

For example, the numeric value $1,234.56 contains special characters ($ and ,) , so SAS will not recognize it. To read such not standard values we need to use an informat (DOLLAR9.2 in this case) to tell SAS that the input data is in a particular format. Now SAS understands the pattern of the data and converts it into standard numeric value before assigning it to a variable $1,234.56 DOLLAR9.2 1234.56

Page 50 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 51: SAS Handout 1.0

Handout - SAS

Few Informats: w.d - Reads standard numeric data $w. - Reads standard character data $CHARw. Reads character data with blanks DOLLARw.d - reads numeric value and removes embedded comma, blanks, dollar

sign, percent sign,or right parenthesis COMMAw.d – similar to DOLLARw.d

Example:

Raw Data Value Informat SAS Data Value

1234567 8. 1234567

1234567 8.2 12345.67

1234.567 8.3 1234.567

‘ JAMES’ $8. ‘JAMES ’

‘ JAMES’ $CHAR8. ‘ JAMES’

$12,567 COMMA7.0 12567 $CHARw. preserves the leading blanks

Format

A Format is used to write data in non-standard form. A format is a pattern / instruction that SAS uses to write data values in the output. The General form of Format is same as that of Informat. Name of the Format is also same as that of Informat but the functionality is exactly reverse. For example, to display the value 1234.56 as $1,234.56 in a report, you can use the DOLLAR9.2 format 1234.56 DOLLAR9.2 $1,234.56 Few Formats:

w.d – Writes standard numeric data $w. - Writes standard character data $CHARw. Writes standard character data. DOLLARw.d – Converts standard numeric value to DOLLAR w.d form and prints it in

the output/report. COMMAw.d – similar to DOLLARw.d format but won’t prefix ‘$’ sign

Page 51 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 52: SAS Handout 1.0

Handout - SAS

Key Concept: Formats alter the external representation of the values of variables stored in SAS data sets. The internal value remains the same, but how we see it, outside of the data set, is controlled by the Format we choose to associate to the variable. Format and Informat statement: Format / Informat Statement is used to associate a format / informat to a variable. General Form: Format variable format; Informat variable informat;

Example: Format DOB date9. ; Informat SAL COMMA9.2 ;

Working with SAS Date and Time

SAS stores the Date and Time values in Numeric form.

Date

SAS system stores the date values by converting dates into integers representing the number of days between January 1, 1960, and a specified date. SAS system can represent the dates between 1582 A.D and 20,000 A.D

Time

SAS System processes time values by converting it to integer representing number of seconds since midnight of the current day. SAS time values are independent of the date.

Page 52 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 53: SAS Handout 1.0

Handout - SAS

Datetime

Combines the Date and Time values as a single value. SAS System processes Datetime values by converting it to integer representing number of seconds since midnight of January 1st, 1960 and a specified Datetime. SAS reads and displays Date, Time and DateTime values through Informats & Formats. There are many Informat and Formats available in SAS for reading Data, Time, and DateTime values. Some commonly used Formats:

SAS Date Value Format Displayed Value

0 MMDDYY8. 01/01/60

30 DDMMYY10. 31/01/1960

30 YYMMDD10. 1960/01/31

1 DATE7. 02JAN60

1 DATE9. 02JAN1960

-1 WORDDATE December 31, 1959

366 WEEKDATE. Sunday,January 1, 1961

YEARCUTOFF Option: This System option specifies the first year of a 100 year span used by Informats & functions. Based on this, the century values of dates are determined by SAS system if the year is specified as a two digit year. Default century is 1920 and can be overridden using the OPTIONS statement.

How it works: When a two-digit year value is read, SAS interprets it based on a 100-year span that starts with the YEARCUTOFF= value.

Page 53 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 54: SAS Handout 1.0

Handout - SAS

Example:

DATE & TIME functions:

Function Typical Use Result

TODAY dt = today() ; today's date as a SAS date value

TIME tt = time() ; current time as a SAS time value

DATETIME datetime = datetime() ; current time as a SAS DateTime value

DAY day=day(date); day of month (1-31)

HOUR hh = hour() ; current hour (1 - 24)

WEEKDAY wkday=weekday(date); day of week (1-7) of the date value

MONTH month=month(date); month (1-12) of the date value

MDY date=mdy(mon,day,yr); Combines mon, day, yr into SAS data value

DATEPART dt = datepart(datetime) returns the SAS date value from the SAS Datetime value

Styles of input

There are different styles of inputs available. They are List Input Column Input Formatted Input

List input

List input uses a scanning method for locating data values. Example: DATA EMP; length name $ 13; input Empid name $ Sal; cards; 111 LawrenceJames 2000 222 Martina 3000 ;

Page 54 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 55: SAS Handout 1.0

Handout - SAS

For List input style: Data values must be separated by at least one blank or other defined delimiter. Character values cannot contain embedded blanks when the file is delimited by

blanks. Fields must be read in order. Data must be in standard numeric or character format and you should not use

Informats for reading. Missing values can be specified only by “.” If the length of character data is more than 8 characters, then SAS reads only the first

8 characters. This behaviour can be overridden by using the LENGTH statement.

Column Input

Column input enables you to read standard data values that are aligned in columns in the data records. To use column input, data values must be in the same column (field lines) for all the records and in standard numeric or character form. Example: data scores; input Empid 1-10 Name $ 11-25 Sal 27-35; cards; 111LawrenceJames 2000 222 Martina 3000 333 George 4000 ;

Features:

Character values can contain embedded blanks and can be from 1 to 32,767 characters long.

No period is required for missing data. Input values can be read in any order, regardless of their position in the record. Values do not need to be separated by blanks or other delimiters. Both leading and trailing blanks within the field are ignored. Data must be within same columns on all input lines Use the TRUNCOVER option on the INFILE statement to ensure that SAS handles

data values of varying lengths appropriately.

Formatted Input

Formatted input combines the flexibility of using Informats with many of the features of column input. Formatted input is typically used with pointer controls that enable you to control the position of the input pointer in the input buffer when you read data. This is the most widely used styles of Input.

Page 55 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 56: SAS Handout 1.0

Handout - SAS

General Form: INPUT @n variable-name informat. ...; Where,

@n: moves the pointer to the starting position of the field. variable-name: names the SAS variable being created. Informat – Informat Name: Specifies how many positions to read and how to

convert the raw data into a SAS value. Example 1: data scores; input name $15. +6 score1 comma5. +8 score2 comma5. ; cards; James 1,000 1,220 Martina 1,100 1,210 ; Run;

Example 2: data scores; input @1 name $15. @21 score1 comma5. @33 score2 comma5. ; cards; James 1,000 1,220 Martina 1,100 1,210 ; Run;

Features:

Can read data in non–standard form Character values can contain embedded blanks and can be from 1 to 32,767

characters long. No period is required for missing data. With the use of pointer controls to position the pointer, input values can be read in any

order regardless of their positions in the record.

Writing to an external file

INFILE & INPUT statements are used to read from an external file. Similarly FILE & PUT statements are used to write the observations to an external file. FILE statement specifies the external output file that will be created.

Page 56 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 57: SAS Handout 1.0

Handout - SAS

General Form: FILE 'output-file' <options>;

Where,

output-file - points to the raw data file being read options - affect how SAS reads the raw data file

The following options used with the INFILE statement can also be used with the FILE statement.

DLM DSD LRECL RECFM

Example: DATA TEMP; SET EMP; FILE OUT ; PUT @1 NAME $CHAR10. @15 EMPID 5. @25 SAL DOLLAR10.2 ; RUN;

The above program creates an output file and a dataset. But the goal of this SAS program is to create only a raw data file and not a SAS data set. So it is inefficient to list a data set name in the DATA statement. Using the _NULL_ Keyword: Using the keyword _NULL_ as the data set name causes SAS to execute the DATA step without writing observations to a data set. _NULL_ is a dummy dataset and it will not contain any observations in it. The same program can be re-written to create only an output file. Example: DATA _NULL_; SET EMP; FILE OUT ; PUT @1 NAME $CHAR10. @15 EMPID 5. @25 SAL DOLLAR10.2 ; RUN;

Page 57 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 58: SAS Handout 1.0

Handout - SAS

Try It Out

Problem Statement

Given the raw data file description below and sample data, write a SAS System DATA step to read this data file, and create a SAS data set called FIRESTATION. Use formatted input; do not use ending columns. Also print the values of DATE in DATE9. format and AMOUNT in COMMA13.2 format Description of the File FIRE.TXT: Variable Starting Name Column Length Format Description ----------------------------------------------------------------------------------------------------------- CALL_NO 1 3 Numeric Call number DATE 5 8 MM/DD/YY Date of service TRUCKS 14 2 Numeric Number of trucks ALARM 17 1 Numeric Number of alarms AMOUNT 19 13 Numeric Amount spent Actual Data in File FIRE.TXT: 001 10/21/94 03 2 $12,300.00 002 10/23/94 01 1 $456,678.00 003 11/01/94 11 3 123,456.89

Code

DATA FIRESTATION; INFILE 'FIRE.TXT'; INPUT @1 CALL_NO 3. @5 DATE MMDDYY8. @14 TRUCKS 2. @17 ALARM 1. @19 AMOUNT DOLLAR13.2 ; RUN; PROC PRINT DATA = FIRESTATION; FORMAT DATE DATE9. AMOUNT COMMA13.2 ; RUN;

Refer File Name: 7.1.sas to obtain soft copy of the program code

Page 58 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 59: SAS Handout 1.0

Handout - SAS

How It Works

The Informat MMDDYY8. and DOLLAR13.2 converts the date and amount value into standard form and assigns in the variables

The FORMAT statement applies the DATE9. and COMMA13.2 formats to the fields DATE and AMOUNT respectively. So the output appears in the specified format.

Without the FORMAT statement SAS prints the values of DATE and AMOUNT in standard numeric form.

Problem Statement

Convert the contents of the Dataset FIRESTATION (created in problem 7.1) to a csv (comma-separated-value) file. Apply the following Formats to the variables. DATE - DATE9. format AMOUNT - COMMA13.2 format

Code

DATA _NULL_; FORMAT CALL_NO 3. DATE DATE9. TRUCKS 2. ALARM 1. AMOUNT COMMA13.2 ; FILE ‘OUT.CSV’ DLM = ‘,’; PUT CALL_NO DATE TRUCKS ALARM AMOUNT ; RUN;

Refer File Name: 7.2.sas to obtain soft copy of the program code

How It Works

The FORMAT statement associates the Formats with the variables DLM = ‘,’ option specifies that the output file is a comma-separated-value file. _NULL_ dataset suppresses the creation of a dataset.

Page 59 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 60: SAS Handout 1.0

Handout - SAS

Summary

OPTIONS statement to changes SAS system options. Informat is used to read data in non-standard form Format is used to write data in non-standard form SAS stores the Data & Time values as Numeric Based on YEARCUTOFF value, the century values of dates are determined by SAS

system if the year is specified as a two digit year The different styles of inputs are LIST, COLUMN & FORMATTED input. FILE statement specifies the external output file that will be created _NULL_ is a dummy dataset and it will not contain any observations in it

Test your Understanding

1. What is the difference between an informat and a format? 2. Name three informats or formats. 3. What statement you code to tell SAS that it is to write to an external file? 4. What statement do you code to write the record to the file? 5. If you're not wanting any SAS output from a data step, how would you code the data

statement to prevent SAS from producing a set? 6. What is the one statement to set the criteria of data that can be coded in any step? 7. Approximately what date is represented by the SAS date value of 730? 8. Create a program for the following requirement

Read the below mentioned raw data into a SAS dataset. prodid prodname quantity 4014 furniture 108 5785 carpet 322 3743 elect goods 488 3298 television 467 prodid field starts from position 1 and of data type numeric. prodname field starts from position 6 and of character data type. quantity field starts from position 18 and numeric.

Page 60 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 61: SAS Handout 1.0

Handout - SAS

Session 09: SAS Procedures

Learning Objectives

After completing this session, you will be able to: Work with the following procedures: o PRINT o CONTENTS o SORT o FORMAT o DATASETS

SAS Procedures

Procedures are a Library of built-in programs or utilities for processing datasets and displaying results. PROC step: It begins with the keyword PROC and consists of a group of SAS statements that call and execute a procedure, with a SAS dataset as input. Procedures can use only datasets, and not other files. The procedures analyze the data and generate output as reports, charts, graphs, datasets, etc. Most of the SAS procedures work with the Data portion of the dataset.

PROC PRINT

PRINT procedure prints observations in a SAS dataset using all or some of the variables. The PRINT procedure can be controlled by the following statements and options. General Form: PROC PRINT <options>; < Statements>; RUN;

Where, Statements:

VAR variable-list; BY variable-list; SUM variable-list; LABEL

Options:

DATA=SAS-data-set Specifies the SAS data set to use Double - Writes a blank line between observations.

Page 61 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 62: SAS Handout 1.0

Handout - SAS

Label - uses variable labels as column headings (variable name is the default heading)

Split='split character' – PROC PRINT breaks a column heading when it reaches the split character and continues the header on the next line.

Statements: VAR:

Select variables that appear in the report and determine their order If not used, SAS prints the values of the all the variables.

BY:

Produce a separate section of the report for each BY group The dataset needs to be sorted using the BY variable before using the BY statement

LABEL:

LABEL statement is used to assign Labels to the Variables. LABEL option needs to be used with PROC PRINT to print the Labels.

SUM:

Adds the total values of numeric variables specified Sample Program: PROC PRINT DATA = EMP LABEL SPLIT = '*'; VAR EMPID NAME SALARY; BY DEPTID; SUM SALARY; LABEL EMPID = 'Employee * ID' DEPTID = 'Department * ID' NAME = 'Name of * Employee' SALARY = 'Salary of * Employee' ; RUN;

Page 62 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 63: SAS Handout 1.0

Handout - SAS

Sample Output:

Label

SUM

BY DEPTID

PROC CONTENTS

PROC PRINT prints the Data portion of the Dataset whereas, CONTENTS prints the Descriptor portion. PROC CONTENTS describes the structure of the data set. It displays information at the Data set level and Variable level Data set level: All the below information comes under the Dataset level

Name SAS Version no Creation/Modified date Number of observations Number of variables File size (bytes)

Page 63 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 64: SAS Handout 1.0

Handout - SAS

Variable level: The following information is under Variable level

Name Type Length Formats Position Label

General Form: PROC CONTENTS DATA = <dataset> <options>;

Options: NOPRINT: Suppresses printed output. POSITON: list variables in order of position & not alphabetically (the default). Example: proc contents data=test1; run;

Sample Program: PROC CONTENTS DATA = EMP; RUN;

Sample Output:

Page 64 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 65: SAS Handout 1.0

Handout - SAS

PROC SORT

This procedure sorts observations in a SAS dataset by one or more variables. It either modifies the existing dataset or writes into a new one. By default it sorts in ASCENDING order. General form: PROC SORT DATA=SAS-data-set <OUT=SAS-data-set> options; BY <DESCENDING> BY-variable(s); RUN;

Where,

DATA= option specifies the input data set. BY-variable(s) in the required BY statement specifies one or more variables whose

values are used to sort the data. DESCENDING option in the BY statement sorts observations in descending order. OUT= option specifies the output data set that contains the data in sorted order.

We may also use the following options:

NODUPKEY - eliminates observations with duplicate BY values. DUPOUT = Writes only the duplicate observations to a new dataset. This option is

available only from SAS v9.1. Example: Proc sort data= transfer nodupkey out = lib.trans; By empno; Run;

PROC FORMAT

This procedure is used to create user-defined Formats and Informats for character and numeric variables. General form: PROC FORMAT; VALUE format-name range1='label1' range2='label2' ... ; RUN;

Where, format-name names the format that you are creating.

Page 65 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 66: SAS Handout 1.0

Handout - SAS

The format name: Format names can be up to 32 characters long must begin with a dollar sign ($) if the format applies to character data cannot be the name of an existing SAS format cannot end with a number does not end with a period in the VALUE statement, but use a period while using it range specifies one or more variable values label is a text string enclosed in quotation marks.

Example: proc format; value $grade 'A'='Good' 'B'-'D'='Fair' 'F'='Poor' 'I','U'='See Instructor' Other = ‘Miscoded’ ; run;

The keyword Other is similar to else statement. To create user-defined INFORMAT use the keyword INVALUE instead of VALUE. But usually we will not be using user-defined Informats for reading data values.

PROC DATASETS

The DATASETS procedure is used to manage SAS files in a SAS data library. With PROC DATASETS, you can:

List the SAS files that are contained in a SAS library Copy SAS files from one SAS library to another Rename SAS files Delete SAS files Modify attributes of SAS data sets and variables within the data sets Create and delete indexes on SAS data sets

The DATASETS procedure ends with a RUN statement or QUIT statement. Examples:

1. Prints the descriptor portion of all the datasets in WORK library. proc datasets lib=WORK; contents data=_all_; quit;

Page 66 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 67: SAS Handout 1.0

Handout - SAS

2. Copies all SAS files from the WORK library to the PERM library proc datasets ; copy in=WORK out=PERM; quit;

3. Deletes the EMP data set from the PERM library, changes the name of the DEPTA data

set to DEPTB proc datasets library= PERM; delete EMP; change DEPTA = DEPTB; run;

MODIFY Statement: This statement in the DATASETS procedure is used to change specific dataset or variable attributes. This command allows you to specify formats, informats, and labels, rename variables, and create and delete indexes. The MODIFY command only works on one dataset at a time. The following example modifies the dataset income in COMPANY library by:

Renaming the variable old to new Adding a label to variable new Setting a format for variable income

Example: PROC DATASETS LIBRARY= COMPANY; MODIFY income; RENAME old=new; LABEL new=’originally called old’; FORMAT income comma11.2; RUN;

The MODIFY statement in DATASETS procedure is also used to generate an index on an existing SAS dataset. Without index, while searching, SAS access and checks all the values in a dataset sequentially.

Page 67 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 68: SAS Handout 1.0

Handout - SAS

INDEX: The MODIFY statement in DATASETS procedure is also used to generate an index on an existing SAS dataset. Index is used to quickly search a record from a large dataset For Example, you have to search a table based on the column ‘Name’ and it does not have an index. In this case SAS begins with the first row and reads through all rows in the table.

An index is a SAS file that stores unique values for a specified column in an order, and includes information about the location of those values in the table that enable you to access a row directly, by value. For example, suppose you have created an index on column ‘Name’. Using the index, SAS will access the required row(s) directly, without having to read all the other rows.

Creating an index is useful:

When you use a WHERE statement to filter observations? When merging with another dataset? In performing equijoin in PROC SQL, and so on

This example uses the DATASETS procedure to create a Simple Index. Example: proc datasets library=INDRAILWAY; modify TRNTKT; index create PNRNO / UNIQUE NOMISS; run;

Page 68 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 69: SAS Handout 1.0

Handout - SAS

In the example, the TRNTKT SAS data set in the INDRAILWAY SAS data library is having a Simple index created for the PNRNO variable index create – The INDEX CREATE statement is used to specify that an index is to be created. In the program PNRNO is the index variable. The UNIQUE option specifies that key variable values must be unique within the SAS data set. The NOMISS option specifies that no index entries are to be built for observations with missing key variable values

Try It Out

Problem Statement

You have a SAS data set called HTWT which contains variables ID, HEIGHT, and WEIGHT. HEIGHT and WEIGHT are to be grouped as follows HEIGHT groupings: 0 to 36 = 1 37 to 48 = 2 49 to 60 = 3 > 60 = 4 WEIGHT groupings: 0 to 100 = 1 101 to 200 = 2 > 200 = 3

While printing group the observations by the values of HEIGHT Add all values of WEIGHT values Apply the label ‘ Employee ID’ to the variables ID. Also verify the descriptor portion of the dataset.

Code

PROC FORMAT; VALUE HTFMT 0–36 = '1' 37–48 = '2' 49–60 = '3' 61–HIGH = '4'; VALUE WTFMT 0–100 = '1' 101–200 = '2' 201–HIGH = '3'; RUN; PROC SORT DATA = HTWT; BY HEIGHT ; RUN; PROC PRINT DATA = HTWT LABEL; BY HEIGHT ;

Page 69 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 70: SAS Handout 1.0

Handout - SAS

SUM WEIGHT; LABEL ID = ‘Employee ID’; FORMAT HEIGHT = HTFMT. WEIGHT = WTFMT. ; RUN; PROC CONTENTS DATA = HTWT; RUN;

Refer File Name: 9.1.sas to obtain soft copy of the program code

How It Works

BY statement groups the observations by HEIGHT. Since the data needs to be grouped by HEIGHT, the dataset is sorted by HEIGHT . FORMAT statement applies the user-defined formats to HEIGHT & WEIGHT. SUM statement adds the values of WEIGHT from all the observations. LABLEL statement applies the label to the variable ID. Since we are using the LABEL statement, we should use the LABEL option in PROC

PRINT to turn on the feature.

Summary

PROC PRINT Prints observations in a SAS dataset using all or some of the variables CONTENTS prints the Descriptor portion of a dataset Sorts observations in a SAS dataset by one or more variables and either modifies the

existing dataset or writes into a new one. This procedure allows to define your own formats or informats for character or numeric

variables. The DATASETS procedure is used to manage SAS files in SAS data libraries. The MODIFY statement in DATASETS procedure is also used to generate an index on

an existing SAS dataset.

Test your Understanding

1. Code a PROC SORT on a data set containing State, District and County as the primary variables, along with several numeric variables.

2. How would you delete observations with duplicate keys?

Page 70 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 71: SAS Handout 1.0

Handout - SAS

Session 11: SAS Programming Concepts

Learning Objectives

After completing this session, you will be able to: Retain Variable Values Explain Automatic Variables Describe Titles and Footnotes Differentiate Conditional Processing and Iterative Processing

Retaining Variable Values

SAS’ default behavior is to reset all the variable values to missing during the beginning of next iteration. So SAS will not hold the value of variables from the previous iteration. Using RETAIN statement you can override this default behavior.

RETAIN statement:

The Retain statement retains the value of the variable in the PDV across iterations of the DATA step. It initializes the retained variable to missing before the first execution of the DATA step if an initial value is not specified General Form: RETAIN variable-name <initial-value> … ;

Example: The below statement initializes the variable TOTSAL to 0 and causes it to retain its current value across iterations. RETAIN TOTSAL 0;

Example: DATA ALL; RETAIN TOTSAL 0; SET EMP END = EOF; TOTSAL = TOTSAL + SAL; IF EOF = 1 THEN OUTPUT ALL; RUN;

The dataset ALL has one observation and it contains the Total Salary of all the employees in the dataset EMP.

Page 71 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 72: SAS Handout 1.0

Handout - SAS

SUM statement:

When creating an accumulating variable, an alternative to the RETAIN statement is the sum statement. SUM statement is a short-cut to the RETAIN statement. Instead of writing two statements, you can achieve the same task with a single SUM statement.

General form of the sum statement: variable + expression;

Example: TOTSAL + SAL; In the above example, SAS

Creates a variable named TOTSAL, if it is a new variable and initializes to zero Automatically retains the value of TOTSAL Adds the value of SAL to TOTSAL and ignores missing values

Automatic Variables

Finding the First and Last Observations in a Group: When you use the BY statement along with the SET statement, DATA step creates two temporary variables for each BY variable in the form

FIRST.variable LAST.variable

Their values are either 1 or 0. FIRST.variable and LAST.variable identify the first and last observation in each BY group.

Before using BY statement the Input dataset should be sorted using the BY variable. The BY statement in the DATA step enables you to process your data in groups. The Data Step and the values of the automatic variables are given below.

Page 72 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 73: SAS Handout 1.0

Handout - SAS

Example: DATA temp; SET all; BY dept; RUN;

Dept Salary FIRST.Dept LAST.Dept

APTOPS 20000 1 0

APTOPS 100000 0 0

APTOPS 50000 0 1

FINACE 25000 1 0

FINACE 20000 0 0

FINACE 23000 0 0

FINACE 27000 0 1

SALES 10000 1 0

SALES 12000 0 1

Example: To find DEPT wise total salary of all the employees from the below data

The problem can be divided into three steps.

1. Set the accumulating variable to 0 at the start of each BY group. 2. Increment the accumulating variable with a sum statement (automatically retains). 3. Output only the last observation of each BY group.

Page 73 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 74: SAS Handout 1.0

Handout - SAS

Titles and Footnotes

To make your report more meaningful and self-explanatory, you can specify TITLE & FOOTNOTE statements. They are similar to Header and Footer in MSWord. The text given in the TITLE & FOOTNOTE statements appears in the Top and Bottom of every output page respectively. You can specify up to 10 TITLE & FOOTNOTE statements. General form, TITLE and FOOTNOTE statements: TITLE<n> 'text'; FOOTNOTE<n> 'text'; where, n is a number from 1 to 10 that specifies the title or footnote line 'text' is the actual title or footnote to be displayed. Example: PROC PRINT DATA = EMP; TITLE2 ‘ Start of PROC PRINT Report ‘; TITLE4 ‘ Contents of the Dataset EMP’; Footnote3 ‘ End of PROC PRINT Report’; RUN;

Canceling Titles and Footnotes: TITLE and FOOTNOTE statements are global statements. That is, after you define a title or footnote, it remains in effect until you modify it, cancel it, or till the end of SAS session. The following statements clear the nth and its following Title/footnote statements. TITLE<n> ; FOOTNOTE<n>;

To cancel all the titles or footnotes, specify a null TITLE1 or FOOTNOTE1 statement like, TITLE1; FOOTNOTE1;

Page 74 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 75: SAS Handout 1.0

Handout - SAS

Conditional Processing

There are different forms of IF statement. They are given below. Type 1: Simple IF Statement: IF <condition> THEN <Statement>;

Type 2: IF–THEN-ELSE Statement: IF <condition> THEN <True Block Statement>; ELSE <False Block Statement>;

Type 3: IF–THEN-ELSE-IF Ladder: IF <condition1> THEN <Condn1 True Block Statement>; ELSE IF <condition2> THEN < Condn2 True Block Statement>; ELSE <False Block Statement>;

If there is more than one statement in a particular block, then group them in a DO - END loop. IF <condition> THEN DO; <True Block Statement>; END; ELSE DO; <False Block Statement>; END;

SELECT-CASE:

You can also use SELECT groups in DATA steps to perform conditional processing. This is similar to SWITCH-CASE statement in ‘C Language’ General form, SELECT group: SELECT <(expression)>; WHEN-1 <(expression)> statement; WHEN-n <(expression)> statement; <OTHERWISE statement;> END;

Page 75 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 76: SAS Handout 1.0

Handout - SAS

Where,

Example: The following code assigns a value to variable Title based on the value of designation. Select (designation); when ("PAT") Title="Programmer Analyst Trainee"; when ("PA") Title ="Programmer Analyst"; when ("A") Title ="Associate"; when ("SA) Title ="Senior Associate"; otherwise Title ="Manager"; end;

Subsetting IF statement: The subsetting IF statement causes the DATA step to continue processing only those raw data records or observations that meet the condition of the expression specified in the IF statement. General form: IF condition;

if condition is true, continue to execute data step if condition is false, stop processing current observation and return to top of data step.

In particular, if condition is false do not output the current observation being formed in the PDV

Example: Data PASS; input ID M1 M2 M3; TOT = M1 + M2 + M3; if TOT > 150; /*output obs only if TOT > 150*/ cards; 50 60 80 40 60 30 70 80 90 ; Run;

Only two observations will be written to the dataset

Page 76 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 77: SAS Handout 1.0

Handout - SAS

Iterative Processing

Iterative Statement is used to perform Repetitive calculations Eliminate redundant code Execute SAS code conditionally

DO Loop Processing

Statements within a DO loop executes for a specific number of iterations or until a specific condition stops the loop.

Iterative DO:

TYPE 1: DO index-variable=start TO stop <BY increment>; where,

start – specifies the initial value of the index variable. stop - specifies the ending value of the index variable. Increment – optionally specifies a positive or negative number to control the

incrementing of index-variable. If no increment is specified, the index variable is incremented by 1.

This iterative DO statement executes statements between DO and END statements repetitively based on the value of an index variable. Example 1: do i=1 to 12 by 4; <statements>; end;

Example 2: do m=3.5 to 2.5 by -0.05;

Example 3: do k = Begindate to Today() by 7;

TYPE 2: DO index-variable=item-1, <…item-n>; Item-1 through item-n can be either all numeric or all character constants or they can be variables. The DO loop is executed once for each value in the list.

Page 77 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 78: SAS Handout 1.0

Handout - SAS

Example for Type 2: 1: do Month = ‘JAN’, “FEB’, MAR’;

2: do Fib = 1,2,3,4;

3: do i=var1, var2, var3;

Conditional Iterative Processing:

We can use DO WHILE and DO UNTIL statement to stop the loop when a condition is met, rather than when the index variable exceeds a specific value.

DO WHILE

The DO WHILE statement executes statements in a DO loop while a condition is true. General form: DO WHILE (expression); <additional SAS statements> END;

Expression is evaluated at the top of the loop. The statements in the loop never execute if the expression is initially false.

DO UNTIL

The DO UNTIL statement executes statements in a DO loop until a condition is true. General form: DO UNTIL (expression); <additional SAS statements> END;

Expression is evaluated at the bottom of the loop. The statements in the loop are executed at least once. Sample Program: data invest; do until(Capital > 20000); Year+1; Capital+5000; Capital+(Capital*.075); output;

Page 78 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 79: SAS Handout 1.0

Handout - SAS

end; run; proc print data=invest noobs; run;

Sample Output:

Iterative DO + Condition DO : The DO WHILE and the DO UNTIL statements can be combined with the iterative DO statement. General form: DO index-variable=start TO stop <BY increment> WHILE | UNTIL (expression); <additional SAS statements> END;

This is one method of avoiding an infinite loop in DO WHILE or DO UNTIL statements. Sample Program: data invest; do year= 1 to 10 until(Capital > 20000); Capital+5000; Capital+(Capital*.075); output; end; run; proc print data=invest noobs; run;

Page 79 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 80: SAS Handout 1.0

Handout - SAS

Sample Output:

Other Data Step statements

KEEP and DROP Statements

The KEEP & DROP statements are similar to the KEEP & DROP options. General form, DROP and KEEP statements:

DROP variable(s); KEEP variable(s);

Where, variable(s) identifies the variables to drop or keep.

The DROP statement excludes specified variables from a data set. The KEEP statement includes only the specified variables. DROP & KEEP statement can be used anywhere in the DATA step.

Example:

DELETE statement

The DELETE statement deletes observations from the data set being created. General Form: IF condition THEN DELETE; If condition is true, stop processing current observation and return to top of data step.In particular, if condition is true, do not output the current observation being formed in the PDV

Page 80 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 81: SAS Handout 1.0

Handout - SAS

Example: DATA EMP; INPUT ID NAME $ SAL; IF SAL >= 1000 THEN DELETE; SAL = SAL + 500; RUN;

THE ‘SAL=’ statement is executed only when the SAL value is < 1000.

PUT STATEMENT

If PUT statement is used without a FILE statement, it writes the values of variables to the LOG file. General Form: PUT <variable list> <format specifier>; Use ‘FILE PRINT;’ statement above the PUT statement to print the values in the OUTPUT window. Special SAS Names (Shortcuts):

_NUMERIC_ - refers to all the numeric variables in a Dataset _CHARACTER_ - refers to all the character variables in a Dataset _ALL_ - refers to all the character & numeric variables in a Dataset

Try It Out

Problem Statement

You have a SAS data set DIET which contains variables ID, DATE, and WEIGHT. There are four records per ID. The task is to create a new SAS data set DIET2 from DIET which contains only one

record per subject, with each record containing the subject ID and the mean weight for the subject.

As an additional "learning experience," rewrite the code using a sum statement (not a SUM function.)

The sample data of data set DIET is shown below: Data Set DIET ID DATE WEIGHT 1 10/01/92 155 1 10/08/92 158 1 10/15/92 158 1 10/22/92 158 2 09/02/92 200 2 09/09/92 198

Page 81 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 82: SAS Handout 1.0

Handout - SAS

2 09/16/92 196 2 09/23/92 202

Code

PROC SORT DATA = DIET; BY ID; RUN; DATA DIET2; SET DIET; BY ID; RETAIN MEAN_WT; IF FIRST.ID THEN MEAN_WT = WEIGHT; ELSE MEAN_WT = MEAN_WT + WEIGHT; IF LAST.ID THEN DO; MEAN_WT = MEAN_WT / 4; OUTPUT; END; RUN; /**** The solution using a sum statement looks like this ****/ DATA DIET2; SET DIET; BY ID; IF FIRST.ID THEN MEAN_WT = WEIGHT; ELSE MEAN_WT + WEIGHT; IF LAST.ID THEN DO; MEAN_WT = MEAN_WT / 4; OUTPUT; END; RUN;

Refer File Name: 11.1.sas to obtain soft copy of the program code

How It Works

BY statement reads the observations in groups and created the automatic variables. Use the automatic variables and RETAIN statement to calculate the mean WEIGHT of

each Subject. An alternative way to do this problem is to use a SUM statement instead of the

RETAIN statement.

Page 82 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 83: SAS Handout 1.0

Handout - SAS

Summary

The RETAIN statement retains the value of the variable in the PDV across iterations of the DATA step

SUM statement is a short-cut of the RETAIN statement. FIRST.BY-variable and LAST.BY-variable identify the first and last observation in each

BY group. The text given in the TITLE & FOOTNOTE statements appears in the Top and Bottom

of every page There are different types of Conditional statements available in SAS. DO loop is used to perform repetitive calculations The KEEP & DROP statements are similar to the KEEP & DROP options The DELETE statement deletes observations from the data set being created. PUT statement writes the values of variables to the LOG file

Test your Understanding

1. For what purpose would you use the RETAIN statement? 2. What is the purpose of ODS statement? 3. Explain about FIRST. & LAST. variables? 4. Write a SAS program with minimum data steps for obtaining the following output. Input &

Output datasets are as follows: Input Dataset Output dataset Marks Marks Sum 15 15 15 28 28 43 78 78 121 35 35 156 90 90 246 67 67 313 87 87 400

Page 83 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 84: SAS Handout 1.0

Handout - SAS

Session 13: SAS Programming Concepts/Built-in Functions in SAS

Learning Objectives

After completing this session, you will be able to: Explain the SAS ODS concepts Describe SAS Arrays Work with Arithmetic and String Functions

SAS ODS

SAS Output Delivery System (ODS): ODS is designed to overcome the limitations of the traditional SAS output. ODS allows output from the Data Step & SAS procedures to present in a more “useful and colorful” way. Using ODS we can create output in a variety of formats, such as: html, xls, pdf, rtf, etc. To start output being delivered to ODS the general syntax is: ODS <output-format> <options>;

To end output being delivered to ODS: ODS <output-format> CLOSE;

Where, output-format – is your output destination options – you can specify the location of the output file. The output file will be created

in the specified location and opened with SAS in a separate window called Report Viewer.

HTML: ODS HTML FILE = “C:\SASFILES\TEST.HTML”; < SAS Procedures> ODS HTML CLOSE;

All output from any procedure that exists between "ods html .... ; " and "ods html close;" statements will be sent to that ODS destination. XLS: Excel File ODS HTML FILE = “C:\SASFILES\TEST.XLS”; < SAS Procedures> ODS HTML CLOSE;

Page 84 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 85: SAS Handout 1.0

Handout - SAS

RTF: RTF stands for Rich Text Document and is supported by MS WORD. ODS RTF FILE = “C:\SASFILES\TEST.RTF”; < SAS Procedures> ODS RTF CLOSE;

PDF: ODS PDF FILE = “C:\SASFILES\TEST.PDF”; < SAS Procedures> ODS PDF CLOSE;

Arrays in SAS

Arrays in SAS are different from arrays in other programming languages.A SAS array is a temporary grouping of variables under a single name. It exists only for the duration of current DATA step. An array is not a variable. Each variable in an array is called an element identified by a subscript that represents the position of the element in the array. When you use an array reference, the corresponding variable is substituted for the reference. Why use SAS arrays?

To repeat an action or set of actions on each of a group of variables To create many variables with same attributes write shorter programs compare variables Perform table lookup

General Form: ARRAY array-name {subscript} <$><length> <array-elements> <(initial-value-list)>;

The ARRAY statement defines the elements in an array. These elements will be processed as a group.

You can refer to elements of the array by the array name and subscript. The ARRAY statement:

Must contain all numeric or all character elements Must be used to define an array before the array name is referenced Creates variables if they do not already exist in the PDV.

Example: ARRAY Contrib{4} Qtr1 Qtr2 Qtr3 Qtr4;

Here, Qtr1 Qtr2 Qtr3 Qtr4 are the existing variables. Contents of the PDV are given below along with the ARRAY references.

Page 85 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 86: SAS Handout 1.0

Handout - SAS

Array Name CONTRIB groups the variables Qtr1, Qtr2, Qtr3 & Qtr4. The individual variables can be accessed by using the array name & a subscript. Example: Consider you have a dataset EMP with 50 numeric variables and you have to recode the value of all the numeric variables to 99, if its value is missing. If you are not using Array then you need to repeat the following statement 50 times in the DATA step. IF Variable = . THEN Variable = 99 ;

With the use of Arrays, we can simplify our SAS program like the following one. Example: Data All; Set EMP; array nvar(*) _numeric_; do i=1 to dim(nvar); if nvar(i)= . then nvar(i)= 99; end; Run;

nvar(*) – dynamically calculates the no of elements Dim( ) - is an array function which returns the no of elements in an array. _numeric_ - is a keyword that refers to all the numeric variables

Page 86 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 87: SAS Handout 1.0

Handout - SAS

Built-in Functions in SAS: SAS has a number of in-built functions. Broadly they can be classified as :

Arithmetic Functions String Functions Date Time Functions

Each of these functions is described below.

Arithmetic Functions

INT

Returns the integer value of the argument. Syntax INT(argument)

Example

Example Result

X=INT(2.1) X=2

X=INT(-2.4) X=-2

X=INT(3) X=3

X=INT(-1.6) X=-1

MAX

Returns the largest of non-missing values. Syntax: MAX(argument,argument…)

Example Result

X1 = MAX(2,6,.) X1=6.00000

X2 = MAX(2,-3,1,-1) X2=2.00000

X3 = MAX(3,.,-3) X3=3.00000

X4 = MAX(OF X1-X3) X4=6.00000

OF keyword includes all the variables between X1 and X3 i.e., X1,X2 & X3

MIN

Returns the smallest of non-missing values. Syntax: MIN(argument,argument…)

Page 87 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 88: SAS Handout 1.0

Handout - SAS

Example Result

X1 = MIN(2,.,6) X1 = 2.00000

X2 = MIN(2,-3,1,-1) X2 = -3.00000

X3 = MIN(0,4) X3 = 0.00000

X4 = MIN( OF X1-X3) X4 = -3.00000

SUM

Returns the sum of the non-missing variables. Syntax: SUM(argument,argument...)

Example Result

X1 = SUM(4,9,3,8) X1 = 24.00000

X2 = SUM (14,9,13,8,.) X2 = 44.00000

X3 = SUM(OF X1-X2) X3 = 68.00000

MEAN

Returns the average of non-missing values. Syntax: MEAN(argument,argument…)

Example Result

X1 = MEAN(2,.,.,6) X1 = 4.00000

X2 = MEAN(1,2,3,2) X2 = 2.00000

X3 = MEAN(OF X1-X2) X3 = 3.00000

MOD

Returns the remainder when the integer quotient of argument1 is divided by argument2. Syntax: MOD(argument1,argument2)

Example Result

X=MOD(6,3) X=0.00000

X=MOD(10,3) X=1.00000

X=MOD(11,3.5) X=0.50000

X=MOD(10,-3) X=1.00000

Page 88 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 89: SAS Handout 1.0

Handout - SAS

ROUND

The ROUND function returns a value rounded to the nearest round-off unit. Syntax: ROUND(argument,<round-off unit>)

Where, round-off unit is numeric and non-negative. If round-off-unit is not provided, argument is rounded to the nearest integer

Example Result

X=ROUND(223.456) X=223.00000

X=ROUND(223.456,1) X=223.00000

X=ROUND(223.456,.01) X=223.46000

X=ROUND(223.456,100) X=200.00000

CEIL

The CEIL function returns the smallest integer greater than or equal to the argument. Syntax: NewVar = CEIL(argument);

Example: X=CEIL(4.4); X=5

FLOOR

The FLOOR function returns the greatest integer less than or equal to the argument. Syntax: NewVar=FLOOR(argument);

Example: Y=FLOOR(3.6); Y=3;

Page 89 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 90: SAS Handout 1.0

Handout - SAS

String Functions

You can convert data types either implicitly by allowing the SAS System to do it for you or explicitly with these functions:

INPUT: Character-to-numeric conversion PUT: Numeric-to-character conversion

SAS automatically converts a character value to a numeric value when the character value is used in a numeric context, such as:

Assignment to a numeric variable An arithmetic operation Logical comparison with a numeric value A function that takes numeric arguments.

Explicit Conversion:

INPUT

The INPUT function is used primarily for converting character values to numeric values. Syntax: NumVar = INPUT(source,informat);

Example Result

CVar1='32000'; NVar1=input(CVar1,5.); Nvar1 = 32000

CVar2='32,000'; NVar2=input(CVar2,comma6.); Nvar2 = 32001

CVar3='03may2008'; NVar3=input(CVar3,date9.); Nvar3 = 17655

PUT:

Converts numeric values to character and writes values with a specific format. Syntax: CharVar = PUT(source,format);

Example Result

NVar1=614; CVar1=put(NVar1,3.); Cvar1 = ‘614’

NVar2=55000; CVar2=put(NVar2,dollar7.); Cvar2 = ‘55000’

NVar3=366; CVar3=put(NVar3,date9.); Cvar3 = ‘366’

Page 90 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 91: SAS Handout 1.0

Handout - SAS

The values of Cvar are stored in character form. ** The enclosed quotes are used just to represent that the values are stored in character form.

LENGTH

Returns the length of an argument. Syntax: LENGTH(argument)

Example Result

len = LENGTH(‘ABCDEF’); len = 6

RIGHT

The RIGHT function returns its argument right aligned. Trailing blanks are moved to the start of the value. Syntax: RIGHT(argument)

Example: a = ‘due date ‘; b = RIGHT(a);

Variable ‘b’ will hold a string ‘ due date’ shifted right three spaces with leading blanks instead of trailing blanks.

LEFT

Left aligns a SAS character expression. Syntax: LEFT(argument)

Example: a = ‘ due date ‘; b = LEFT ( a );

Above statements produce a character string ‘due date ’ shifted left three spaces with trailing blanks instead of leading blanks.

TRIM

The TRIM function removes trailing blanks from its argument.If the argument is blank, TRIM returns one blank.

Page 91 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 92: SAS Handout 1.0

Handout - SAS

Syntax: TRIM(argument)

Example Result

part1 = ‘apple ’; part2 = ‘sauce’; noblank = TRIM(part1) || part2; hasblank = part1 || part2 ;

part1 = ‘apple ’; part2 = ‘sauce’; noblank = ‘applesauce’ hasblank = ‘apple sauce’

Leading blanks will not be removed. To remove both leading & trailing blanks use LEFT & TRIM function like, Variable = TRIM(LEFT(ARGUMENT));

STRIP

Strips leading and trailing blanks from a character variable or character string. This function is an enhancement in SAS v9. Example: a = ‘ due date ‘; b = STRIP(a);

Variable ‘b’ will contain a character string ‘due date’ without leading and trailing spaces.

LOWCASE

Converts all letters in its argument to lowercase. It has no effect on digits and special characters. Syntax: NewVal=LOWCASE(argument);

Example Result

a = ‘STRONG ‘; b = LOWCASE ( a );

a = ‘STRONG’ b = ‘strong’

UPCASE

Converts all letters in its argument to uppercase. It has no effect on digits and special characters. Syntax: NewVal=UPCASE(argument);

Example Result

a = ‘cognizant’ b = UPCASE(a);

a = ‘cognizant’ b = ‘COGNIZANT’

Page 92 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 93: SAS Handout 1.0

Handout - SAS

PROPCASE

Converts Text to “proper” case First character of each “word” in upper case All other characters are in lower case

Syntax: NewVar = PROPCASE(char_var);

Example Result

a = ‘cognizant’ b = PROPCASE(a);

a = ‘cognizant’ b = ‘Cognizant’

COMPRESS

Removes specific characters from character expressions. Syntax: COMPRESS(source<,characters-to-remove>)

where,

source: specifies a SAS character expression. characters-to-remove: specifies the character or characters you want to remove from

the source expression. If the second argument is omitted, by default it is taken as blank.

Example Result

a = ‘ AB C D ‘; b = COMPRESS(a); p = ‘AB CDE’; q = COMPRESS(p ,'D‘) ;

a = ‘AB C D ‘; b = ‘ABCD’ p = ‘AB CDE’ q = ‘AB CE’

REPEAT

Returns a character value consisting of the first argument repeated n + 1 times. Syntax REPEAT(argument,n)

Example Result

a = ‘abc‘; b = REPEAT(a,3);

a = ‘abc‘; b = 'abcabcabcabc';

Page 93 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 94: SAS Handout 1.0

Handout - SAS

SUBSTR

The SUBSTR function is used to extract or insert characters. Syntax: NewVar = SUBSTR(string,start<,length>);

Example Result

date = ’06MAY89‘; month = SUBSTR(date,3,3);

date = ’06MAY89‘; month = MAY

Example: Extract two characters from Location starting at position 11.

INDEX

The INDEX function searches a source string value for the location of a specified Sub-string value and returns its location. Syntax: Position = INDEX(source-string, sub-string);

The INDEX function returns

the starting position of the first occurrence of value within target, if value is found. 0, if value is not found.

Example:

Example Result

a = ‘ABC.DEF (X=Y) ‘; b = ‘X=Y’; x = INDEX(a,b) ;

a = ‘ABC.DEF (X=Y)‘; b = ‘X=Y’; x = 10

INDEXC

This function is similar to INDEX function, but the sub-string is considered as separate characters. Locates the first occurrence in the source of characters present in any of the excerpts. If the character string specified by any of the excerpts is not found in the source,

value 0 will be returned.

Page 94 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 95: SAS Handout 1.0

Handout - SAS

INDEX function searches for a character string in a source string but INDEXC function searches for individual characters

Syntax: INDEXC(source,excerpt-1<,…excerpt-n>)

Example Result

a = ‘ABC.DEF (X=Y) ‘; x=INDEXC(a,’0123456789’,’;( )=.’);

a = ‘ABC.DEF (X=Y) ‘; x = 4

TRANSLATE

It replaces specific character in a character expression. Syntax: TRANSLATE(source, target-characters, replacement-characters)

Where,

source: Specifies the SAS expression containing original character value target-characters: Specifies the characters you want TRANSLATE to use as

substitutes. replacement-characters: Specifies characters you want TRANSLATE to replace.

Values of ‘to’ and ‘from’ correspond on a character-by-character basis. TRANSLATE changes character one of ‘from’ to character one of ‘to’, and so on. If ‘to’ have fewer characters than ‘from’, TRANSLATE changes the extra ‘from’

characters to blanks. If ‘to’ has more characters than ‘from’, ‘TRANSLATE’ ignores the extra ‘to’ characters.

Example Result

d = TRANSLATE ( ‘xyzw’,’ab’ ,’vw’) d = xyzb

TRANWRD

The TRANWRD function replaces or removes all occurrences of a given word (or a pattern of characters) within a character string. Syntax: NewVal=TRANWRD(source,target,replacement);

Example: Dessert = Pumpkin pie Dessert=tranwrd(Dessert,'Pumpkin','Apple');

Page 95 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 96: SAS Handout 1.0

Handout - SAS

Result: Dessert = Apple pie

VERIFY

Returns the position of the first character in the source string that is not in the check-string Syntax: VERIFY(source,check-string);

Example Result part1 = ‘apple’; check = ‘abcdef’; x = VERIFY(part1,check);

x = 2

In this case, the second character ‘p’ of the string ‘apple’ is not present in the excerpt ‘abcdef’ and so the position of ‘p’ is returned to the variable x.

SCAN

The SCAN function returns the nth word of a character value. It is used to extract words from a character value if they are separated by delimiters Syntax: NewVar = SCAN (source-string, n <,delimiters>);

Where,

source-string: Specifies the character variable or expression to scan n: Specifies which word to read <delimiters>: Delimiters are special characters that must be enclosed in single

quotation marks (' '). If no delimiters are specified, SAS treats the following characters as delimiters

< ( + | & ! $ * ) ; ^ - / , % > \

Example: Phrase = ‘software and services’ ; Second=scan(Phrase,2,' ');

Page 96 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 97: SAS Handout 1.0

Handout - SAS

The “CAT” Functions: The Version 9 family of CAT functions reduces complexity when concatenating strings.

CAT CATT CATS CATX

CAT Function What it Does

CAT Concatenate two or more character strings, leaving leading or trailing blanks unchanged. Identical to the concatenation operator [ || ].

CATS Same as CAT but also strips both leading and trailing blanks prior to concatenation.

CATT Same as CAT but also TRIMS

CATX Concatenate two or more character strings, stripping both leading and trailing blanks, and inserting one or more user specified separation characters

Syntax: For CAT, CATS, CATX functions CAT(string-1, string-2 <,string-n>)

Where, string-1, string-2 <,string-n> are the character strings to be concatenated. For CATX function CATX(separator, string-1, string-2 <,string-n>)

Where, separator is one or more characters, placed in single or double quotation marks, to be used as separators between the concatenated strings. Example:

Page 97 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 98: SAS Handout 1.0

Handout - SAS

A = “Micky” B = “ Mouse ”

CAT Function Usage Result

CAT CAT_FN = CAT(A,B) CAT_FN = "Micky Mouse "

CATS CATS_FN = CAT(A,B) CATS_FN = "MickyMouse"

CATT CATT_FN = CAT(A,B) CATT_FN = "Micky Mouse"

CATX CATT_FN = CAT(":",A,B) CATT_FN = "Micky:Mouse"

Try It Out

Problem Statement 1

You have a SAS data set SCORES, which contains an ID variable and a variable called STRING which holds five 1-digit scores.

Write a SAS program to read this data set and create a new data set which contains an ID and five numeric variables X1 to X5, where the X's are each of the digits in STRING.

Following are some sample data: Data Set SCORES ID STRING 1 12345 2 13243 3 53421

Code

/* Solution without arrays: */ DATA NEW; SET SCORES; X1 = INPUT (SUBSTR(STRING,1,1),1.); X2 = INPUT (SUBSTR(STRING,2,1),1.); X3 = INPUT (SUBSTR(STRING,3,1),1.); X4 = INPUT (SUBSTR(STRING,4,1),1.); X5 = INPUT (SUBSTR(STRING,5,1),1.); KEEP ID X1–X5; RUN; / * Solution using arrays: */ DATA NEW; SET SCORES; ARRAY X[5] X1–X5; DO POINTER = 1 TO 5;

Page 98 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 99: SAS Handout 1.0

Handout - SAS

X[POINTER] = INPUT (SUBSTR(STRING,POINTER,1),1.); END; KEEP ID X1–X5; RUN;

Refer File Name: 13.1.sas to obtain soft copy of the program code

How It Works

Without using ARRAYs you may need to repeat the same statement multiple times. X1-X5 refers to all the variables between X1 to X5. Since the X variables are not existing ones they are created by SAS. INPUT function is used to convert the value to Numeric.

Problem Statement 2

You have clinical data in a SAS data set called CLINICAL which contains information on patient visits.

Included in the data set are patient ID, DATE, BILLING (billing number), and DX (diagnosis code).

You also have a list of DX codes and their descriptions. Using the following CLINICAL data and the list of DX codes and descriptions, create a

new data set, NEW, which contains all the variables in CLINICAL plus a new variable (DESCRIP) which contains the DX description.

Use PROC FORMAT and a PUT function as in Example 2 to solve this problem.

Code

PROC FORMAT; VALUE DXCODE 1 = 'Cold' 2 = 'Flu' 3 = 'Asthma' 4 = 'Chest Pain' 5 = 'Maternity' 6 = 'Diabetes'; RUN; DATA CLINICAL;

Page 99 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 100: SAS Handout 1.0

Handout - SAS

INFILE 'CLINICAL'; INPUT ID DATE : MMDDYY8. BILLING DX; RUN; DATA NEW; SET CLINICAL; DESCRIP = PUT (DX,DXCODE.); RUN;

Refer File Name: 13.2.sas to obtain soft copy of the program code

How It Works

Create a format for the values of Dxcode. Assign the description of DX to a new variable using PUT function and the format.

Problem Statement 3

You have a raw data file called TEMPER which contains temperature measurements taken at one hour intervals.

Each raw data line contains several pairs of the variables HOUR (hour of the day) and TEMP (temperature).

All temperatures are in degrees Fahrenheit unless they are written in the form nC (the number n followed by a C, no spaces), in which case they are expressed in degrees Celsius.

In addition, a value of N was coded when a temperature was not obtained. Write a SAS program to read this data file, express all temperatures in degrees Fahrenheit, and convert each N to a numeric missing value.

Hint: The conversion from Celsius to Fahrenheit is: F=9*C/5+32 Some sample records from file TEMPER are as follows: 1 68 2 67 3 N 4 20C 5 72 6 23C 7 75 8 N

Code

DATA TEMP; INFILE 'TEMPER'; INPUT HOUR DUMMY $ @@; IF DUMMY = 'N' THEN TEMP_F = .; ELSE IF INDEX(DUMMY,'C') NE 0 THEN TEMP_F = 9*INPUT (SUBSTR(DUMMY,1,LENGTH(DUMMY)–1),5.)/5 + 32; ELSE TEMP_F = INPUT (DUMMY,5.); DROP DUMMY; RUN;

Page 100 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 101: SAS Handout 1.0

Handout - SAS

Refer File Name: 13.3.sas to obtain soft copy of the program code

How It Works

Since more than one observation is in a single line we are using @@. Use INDEX function to find whether ‘C’ appears in the value of DUMMY. If so extract the numeric part alone and convert it to Fahrenheit by using the given

formula. Else convert the value of DUMMY to numeric.

Problem Statement 4

You have an instream raw data file of patient hospital stays with the following file layout: Starting Column Length Format Description _______________________________________________ 1 3 character Subject ID 4 6 mmddyy Admission date 10 6 mmddyy Discharge date 16 8 mmddyyyy Date of birth Here are some sample data: 00101059201079210211946 00211129211159209011955 00305129206099212251899 00401019301079304051952

a) Write a program to create a SAS data set called DATES1, and list the resulting data set with PROC PRINT. Create variables ID, ADMIT, DISCH, and DOB from the given data, and also create the following new variables:

i. AGE: Age in years on the date of admission (as of the last birthday) ii. DAY: Numeric day of the week of admission date (1=Sun, 2=Mon, etc.) iii. MONTH: Numeric month of year of admission date (1=Jan, 2=Feb, etc.) iv. NoWeek: Number of weeks patient stayed in the hospital

b) Set up the DATA step so that the variables print with the following formats:

i. ADMIT mm/dd/yy ii. DISCH mm/dd/yy iii. DOB ddMMMyyyy

Code

DATA DATES1; INPUT @1 ID $3. @4 ADMIT MMDDYY6. @10 DISCH MMDDYY6. @16 DOB MMDDYY8.; AGE = INT((ADMIT–DOB)/365.25);

Page 101 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 102: SAS Handout 1.0

Handout - SAS

DAY = WEEKDAY (ADMIT); MONTH = MONTH (ADMIT); NOWEEK = INTCK(‘WEEK’, ADMIT, DISCH); FORMAT ADMIT DISCH MMDDYY8. DOB DATE9. ; DATALINES; 00101059201079210211946 00211129211159209011955 00305129206099212251899 00401019301079304051952 ; RUN; PROC PRINT DATA=DATES1; RUN;

Refer File Name: 13.4.sas to obtain soft copy of the program code

How It Works

Since date is stored in no of days in SAS, just by subtracting DOB from Admit date and dividing it by 365.25, we get the person’s AGE. (.25 = ¼ to include the leap year)

INTCK function returns the number of intervals (WEEK in this case) between ADMIT date and DISCH date.

Summary

ODS allows output from the Data Step & SAS procedures to present in a more “useful and colorful” way.

A SAS array is a temporary grouping of variables under a single name. SAS has a number of in-built functions. Broadly they can be classified as Arithmetic Functions, String Functions and Date

Time Functions

Test your Understanding

1. Name and describe three SAS functions that you have used, if any? 2. In ARRAY processing, what does the DIM function do? 3. What is the difference between: 4. x=a+b+c+d; and x=SUM (of a, b, c ,d);? 5. What do the SAS log messages "numeric values have been converted to character"

mean? What are the implications? 6. Which date functions advances a date time or date/time value by a given interval? 7. What is the significance of the 'OF' in X=SUM (OF a1-a4, a6, a9);

Page 102 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 103: SAS Handout 1.0

Handout - SAS

8. What do the following do? INPUT PUT CATX SCAN SUBSTR TRIM MOD

9. Create a program for the following requirement Following is the data in a file: vinodM24 yahooF22 altavistaF18 googleF20 Read the data into a single variable and use functions to retrieve them into three variables

Name Gender Age

Page 103 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 104: SAS Handout 1.0

Handout - SAS

Session 16: Built-in Functions in SAS / Merging and Combining SAS Data Sets

Learning Objectives

After completing this session, you will be able to: Work with date time functions Describe concatenation Perform One-to-One reading Perform One-to-One merging Perform Match-Merging Perform JOINS in DATA step

Date Time Functions

A SAS date, time or date time variable is a special case numeric variable where the values are stored as number of days or seconds. So it is difficult to extract information manually from a date or time variable. SAS provides a bundle of Date & Time functions for extracting the required information from a SAS date or time variable.

DATE or TODAY

Returns the current date as a SAS date value representing the number of days between January 1, 1960 and the current date Syntax: DATE( ) TODAY()

Example Result

tday1 = DATE( ); tday2 = TODAY();

tday1 & tday2 will hold a value which is equal to the number of days between January 1 , 1960 and the date on which the statement is executed.

TIME

Returns the current time of the day as a SAS time value. Syntax: TIME( )

Page 104 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 105: SAS Handout 1.0

Handout - SAS

Example Result

TT = TIME( ) SAS system will assign the variable TT a SAS time value corresponding to 14:32:00 if the following statements is executed exactly at 2:32 p.m.

DATETIME

Returns the current date and time of a day as a SAS datetime value representing the number of seconds between January 1 , 1960 midnight and the current datetime. Syntax: DATETIME( )

Example Result

dttime = DATETIME( ); Variable dttime will hold a SAS value representing the number of seconds between January 1, 1960 midnight and the current datetime.

Extracting the “parts” of a SAS Date, Time or Datetime Variable:

Function Usage Decription

DAY DAY(<date | datetime>) Returns the day of the month from a SAS date or datetime value.

MONTH MONTH(<date | datetime>) Returns the MONTH value from a SAS date or datetime value.

YEAR YEAR(<date | datetime>) Returns the YEAR value from a SAS date or datetime value.

QTR QTR(<date | datetime>)

Returns the QTR of the year from a SAS date or datetime value. JAN-MAR = 1Q; APR-JUN = 2Q JUL-SEP = 3Q; OCT-DEC = 4Q

HOUR HOUR(<time | datetime>) Returns the HOUR value from a SAS time or datetime value. Hour value ranges from 0 to 23

MINUTE MINUTE(<time | datetime>) Returns the MINUTE value from a SAS time or datetime value.

SECOND SECOND(<time | datetime>) Returns the SECOND value from a SAS time or datetime value.

WEEKDAY

Returns a numeric value for the day of the week. Syntax: Wkdy = WEEKDAY(<date | datetime>) Returns the day of the week in numeric from a SAS date or datetime variable.

Page 105 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 106: SAS Handout 1.0

Handout - SAS

1 - SUN 2 - MON 3 - TUE 4 - WED

5 - THU 6 - FRI 7 - SAT

MDY

Returns a SAS date value from month, day and year values. Syntax: MDY(month,day,year)

There are separate variables for month, day and year. MDY function creates a SAS date variable using these values. Where,

month: Specifies a numeric expression representing an integer from 1 through 12. day: Specifies a numeric expression representing an integer from 1 through 31. year: Specifies a numeric expression representing a specific year.

Example Result

m = 8 ; d = 27 ; y = 90 ; date1 = MDY(m,d,y);

date1 will hold a value of 11196 which is the number of days between January 1, 1960 and August 27, 1990.

DATEPART / TIMEPART

A SAS System Datetime Variable contains information on both the date and time i.e., the number of seconds since January 1, 1960. To extract the DATE or TIME ‘parts’ of a SAS datetime variable use,

DATEPART function TIMEPART function

Syntax: DATEPART(datetime) TIMEPART(datetime)

Example: Thursday, Oct. 21, 2004 at 1300 hrs is represented in SAS DateTime value as 1413379800

Page 106 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 107: SAS Handout 1.0

Handout - SAS

Calculating Time Intervals: There are two ways to calculate the time interval between two dates:

1. Arithmetic operation on SAS date, time or datetime variables, or between a variable and a constant

YEARS = (date2-date1)/365.25; MONTHS = (date2-date1)/30.4;

2. Use of the INTCK function

INTCK

Determines the number of interval boundaries which have been crossed between two SAS date, time or date time variables Syntax: INTCK(‘interval’ , from , to)

where, ‘interval’ - character constant or variable name enclosed in single quotes representing the time period of interest

Date Intervals Datetime Intervals Time Intervals

DAY DTDAY HOUR

WEEK DTWEEK MINUTE

MONTH DTMONTH SECOND

QTR DTQTR

YEAR DTYEAR

From – SAS date, time or datetime variable identifying the START of the time interval. To – SAS date, time or datetime variable identifying the END of the time interval. INTCK function calculates only the number of interval boundaries crossed between two dates.

Example Result Description

qtr = INTCK (’QTR’,’10OCT88’D,’01MAR89’d); qtr = 1

Returns the no of QTR boundaries between two dates, i.e., no of JAN 1, APR 1, JUL 1, OCT 1

date = INTCK(‘YEAR’,’31DEC89’D,’1JAN90’D); date = 1 No of Year boundaries, i.e., No of

JAN 1

year = INTCK(‘YEAR’,’1JAN89’D,’31DEC89’D); year = 0 No of Year boundaries, i.e., No of

JAN 1

td = '1dec2008'd; month = INTCK('MONTH','10jan2008'd, td);

month = 11 No of month boundaries, i.e., first day of a month

Page 107 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 108: SAS Handout 1.0

Handout - SAS

INTNX

Creates a SAS date, time or datetime value that is a given number of time intervals from a starting value. Syntax: INTNX(‘interval’,from,no);

Where,

‘interval’ – time interval From – start date no – integer representing the no of time intervals.

The result will be as the first date of the time interval. For example, if interval is ‘MONTH’ then it returns day one of the respective month as SAS date.

Example Result

BDATE = ‘05mar2008’d; DT = INTNX(‘month’,BDATE,3);

the result is a SAS date variable representing the first day of the month which is three months past the BDATE value, i.e., 01JUN2008 as SAS date.

Merging and Combining SAS Data Sets We can create a Dataset from two or more existing data sets by

Combining Data Vertically (appends the observations from one data set to another data set)

Combining Data Horizontally (joining observations side-by-side) Methods to combine SAS data sets Combining Vertically

concatenating interleaving

Combining Horizontally

one-to-one reading one-to-one merging match merging Updating

Combining Vertically

Appends the observations from one or more data set row-wise to create a resultant dataset.

Page 108 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 109: SAS Handout 1.0

Handout - SAS

Concatenating

Concatenating Two Data Sets Concatenating the data sets appends the observations from one data set to another

data set. The DATA step reads DATA1 sequentially until all observations have been processed,

and then reads DATA2 Data set COMBINED contains the results of the concatenation. Note that the data sets are processed in the order in which they are listed in the SET

statement

Interleaving

Interleaving combines observations from two or more data sets, based on one or more common variables. The resultant dataset COMBINED will be in sorted order. Since we are using a BY statement with SET statement, the Input datasets DATA1 & DATA2 should be sorted by the variable YEAR.

Page 109 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 110: SAS Handout 1.0

Handout - SAS

Combining Horizontally

Combining data horizontally refers to the process of merging or joining multiple data sets into one data set

One-to-one reading

In a one-to-one match, key values in both the base table and the lookup table are unique. Therefore, for each observation in the base table, no more than one observation in the lookup table has a matching key value.

One-to-one reading combines observations from two or more SAS data sets by creating observations that contain all of the variables from each contributing data set.

Observations are combined based on their relative position in each data set, that is, the first observation in one data set with the first in the other, and so on.

Page 110 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 111: SAS Handout 1.0

Handout - SAS

The DATA step stops after it has read the last observation from the smallest data set.

One-to-one merging

Similar to one-to-one reading, with two exceptions you use the MERGE statement instead of multiple SET statements, the DATA step reads all observations from all data sets

Match merging

In a one-to-many match, key values in the base table are unique, but key values in the lookup table are not unique

Match-merging combines observations from two or more SAS data sets into a single observation in a new data set based on the values of one or more common variables.

Input datasets DATA1 & DATA2 should be sorted by YEAR before Merging.

Page 111 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 112: SAS Handout 1.0

Handout - SAS

Updating

Updating uses information from observations in a transaction data set to delete, add, or alter information in observations in a master data set.

You can update a master data set by using the UPDATE statement or the MODIFY statement.

If you use the UPDATE statement, your input data sets must be sorted by the values of the variables listed in the BY statement.

If you use the MODIFY statement, your input data does not need to be sorted. By default, UPDATE and MODIFY do not replace non-missing values in a master data

set with missing values from a transaction data set

Page 112 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 113: SAS Handout 1.0

Handout - SAS

Performing JOINS in DATA Step

Identifying Data Set Contributors: When you read multiple SAS data sets in one DATA step, you can use the IN= data set option to detect which data set contributes to the current observation. General form of the IN= data set option: SAS-data-set (IN=variable)

Where, variable is any valid SAS variable name. It is a temporary numeric variable with a value of:

1 if the data set contributes to the observation 0 if the data set does not contribute to the observation

The variable will not be written to the dataset. Example: DATA three; MERGE one(in=a) two(in=b); BY id; in_x = a ;

Page 113 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 114: SAS Handout 1.0

Handout - SAS

in_y = b; RUN;

For the above example, the contents of dataset ONE, TWO & THREE along with the values of the automatic variables are given below.

Performing JOINS in DATA Step: Using the automatic variables we can perform different join operations

Equi-Join Left Outer Join Right Outer Join Full Outer Join

Example: data three; merge one(in=x) two(in=y); by id; <sas join statement> ; run;

Join Operation SAS Statement

Equi-Join IF X AND Y;

Left Outer Join IF X;

Right Outer Join IF Y;

Full Outer Join

Try It Out

Problem Statement 1

You have two SAS data sets. Data set DEMOG contains ID, DOB, and GENDER; data set SCORES contains SSN (which is equivalent to ID in data set DEMOG), IQ, and GPA (grade point average). Write a program to perform INNER Join on the two datasets and write the observations into a new dataset BOTH. Verify the contents of the merged dataset by printing its contents.

Page 114 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 115: SAS Handout 1.0

Handout - SAS

Note:

1. Data are not in ID or 2. There are some IDs that are in one file only.

Code

PROC SORT DATA=DEMOG; BY ID; RUN; PROC SORT DATA=SCORES; BY SSN; RUN; DATA BOTH; MERGE DEMOG (IN=IN_DEMOG) SCORES (IN=IN_SCR RENAME=(SSN=ID)); BY ID; IF IN_DEMOG AND IN_SCR; RUN; PROC PRINT DATA = BOTH; RUN;

Refer File Name: 16.1.sas to obtain soft copy of the program code

How It Works

Sort the two datasets separately. Since the variable name should be the same in both the datasets for performing the

merge, Rename the variable SSN in dataset SCORES to ID. For INNER join use the IN= dataset option to find the contributing dataset.

Page 115 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 116: SAS Handout 1.0

Handout - SAS

Problem Statement 2

Problem Statement: 16.2 You have a MASTER file which contains PART (part number), NUMBER (number in

stock), PRICE, and SIZE. The file is sorted by PART. You want to update this file as follows:

For PART 222, you now have 15 in stock. For PART 123, you have a new price of $1,500. For PART 333, you have a new price of $2,000 and 20 in stock.

Data set MASTER PART NUMBER PRICE SIZE 111 34 8000 A 123 87 1200 B 124 45 800 A 222 19 1300 C 234 20 2000 A 333 30 1800 B

Code

DATA NEWDATA; INPUT PART NUMBER PRICE; DATALINES; 222 15 . 123 . 1500 333 20 2000 RUN; PROC SORT DATA=NEWDATA; BY PART; RUN; DATA MASTER; UPDATE MASTER NEWDATA; BY PART; RUN;

Refer File Name: 16.2.sas to obtain soft copy of the program code

How It Works

Verify the contents of the dataset MASTER after updating. You will find that the missing values in the dataset NEWDATA are not updated to the

MASTER dataset.

Page 116 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 117: SAS Handout 1.0

Handout - SAS

Summary

SAS provides a bundle of Date & Time functions for extracting the required information from a SAS date or time variable

We can create a Dataset from two or more existing data sets by Combining Data Vertically or Combining Data Horizontally

We can use the IN= data set option to detect which data set contributed to an observation.

Test your Understanding

1. What do the following do? a) INTCK b) DATETIME c) MDY d) WEEKDAY

2. When would you choose to MERGE two datasets together and when would you SET two datasets?

3. How would you code a merge that will keep only the observations that have matches from both sets?

4. How would you code a merge that will write the matches of both to one data set, the non-matches from the left-most data?

5. How do the IN= variables improve the capability of a MERGE? 6. Create a program for the following requirement

Consider the following data Name Month Year Day A 10 1928 19 B 9 1981 10 C 12 1975 25 D 15 1990 18 Create a dataset and add a variable name DOJ that contains the data combined from month, year and day.

7. Create a program for the following requirement Read the following raw data into a SAS dataset. Birth date 23041979 21071985 10061976 13081952 Print the contents of this dataset in the following format. Birth date 23APR1979 21JUL1985 10JUN1976 13SEP1952

Page 117 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 118: SAS Handout 1.0

Handout - SAS

Session 18: Statistical Procedures

Learning Objectives

After completing this session, you will be able to: Work with the following statistical procedures: o PROC FREQ o PROC MEANS o PROC SUMMARY o PROC REPORT

PROC FREQ

The FREQ procedure is a descriptive procedure as well as a statistical procedure that produces one way and n-way frequency tables. It concisely describes your data by reporting the distribution of variable values. PROC FREQ displays frequency counts of the data values in a SAS data set. It can produce statistics to analyze relationships among variables. By default, PROC FREQ

Analyzes every variable in the SAS dataset Displays each distinct data value Calculates the number of observations in which each data value appears and the

corresponding percentage Indicates for each variable how many observations have missing values. Creates report on every variable of the data set. Produces percent, cumulative frequency & cumulative percent.

Syntax: PROC FREQ <DATA = dataset>; TABLES <variable list> / options; RUN;

Where, TABLES <variable list>:

Specifies the variables to analyze. Similar to the VAR statement in PRINT procedure. If not used, the FREQ procedure creates frequency tables for every variable in your

data set. Options:

MISSING – includes missing values in the frequency report LIST - prints two-way to n-way tables in a list format rather than as cross tabulation

tables. Nocol - suppresses printing of column percentages of a crosstab.

Page 118 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 119: SAS Handout 1.0

Handout - SAS

Norow - suppresses printing of row percentages of a crosstab. o Nopercent - suppresses printing of cell percentages of a crosstab.

Sample Program and Output: Example: PROC FREQ DATA = EMP; TABLES DEPTID; TITLE3 'One way Freq of DEPTID'; RUN;

Creating Two-Way Tables To produce cross-tabulation report on one or more variables, use asterisk (*) between the variables. PROC FREQ DATA=SAS-data-set; TABLES variable1 * variable2; RUN;

In the cross tabular report, the values of the first variable in the TABLES statement form the rows of the frequency table and the values of the second variable form the columns. Sample Program & Output (List Frequency): The LIST option produces List Frequency Example: PROC FREQ DATA = EMP; TABLES DEPTID * GENDER / LIST; TITLE3 'Two-way Freq of DEPTID Vs GENDER'; RUN;

Page 119 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 120: SAS Handout 1.0

Handout - SAS

Box Frequency: Without the LIST option, it produces BOX Frequency:

Multi-Threaded Processing

Multi-threaded processing is a type of parallel processing introduced in SAS System 9. Parallel processing means, multiple units of work are available to be scheduled for concurrent execution by the operating system. This technology takes advantage of hardware that has multiple CPUs, called symmetric multiprocessing machines (SMPs).

Page 120 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 121: SAS Handout 1.0

Handout - SAS

Processes suitable for threading are: sorting grouping summarizing

The multi-threading capability of SAS improves processing time of the following procedures:

SORT SQL MEANS SUMMARY REPORT

Threaded processing can be controlled via the SAS system option THREADS | NOTHREADS. General Form: OPTIONS THREADS | NOTHREADS;

THREADS – enables Multi-threaded processing NOTHREADS – disables Multi-threaded processing. This is the default option. The THREADS | NOTHREADS option can also be specified in the PROC statement,

which enables or disables multi-threaded processing of the input dataset. When the option is specified in the PROC statement, it overrides the SAS system

option THREADS | NOTHREADS. Example: To enable Multi-threading PROC SORT DATA = EMP THREADS ; PROC SQL THREADS;

To disable Multi-threading PROC MEANS DATA = DEPT NOTHREADS;

PROC MEANS

Computing Statistics Using PROC MEANS: The MEANS procedure displays simple descriptive statistics such as sum, mean, standard deviation, variance, minimum, maximum, etc. for the numeric variables in a SAS data set. General form: PROC MEANS <DATA=SAS-data-set> CLASS <variable list>; VAR <variable list>; OUTPUT OUT=SAS-data-set <statistic-keyword=variable- name(s)>; RUN;

Page 121 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 122: SAS Handout 1.0

Handout - SAS

Example: PROC MEANS DATA = SAS-data-set; RUN;

If PROC MEANS is used without any other statements, by default it

analyzes every numeric variable in the SAS data set prints the statistics N, MEAN, STD, MIN and MAX excludes missing values before calculating statistics.

CLASS <variable list>;

Add the grouping variables in this statement that form the sub groups The CLASS statement groups the observation of the SAS data set for analysis

VAR <variable list>;

List the analysis variables here Statistics calculated for numeric variables listed here.

OUTPUT OUT=SAS-data-set <statistic-keyword=variable-name(s)>;

Creates a SAS dataset, in which the computed summary statistics are stored. Where,

SAS-data-set specifies the name of the output data set statistic-keyword= specifies the summary statistic to be written out Variable - name(s) specifies the names of the variables that will be created to contain

the values of the summary statistic. These variables correspond to the analysis variables that are listed in the VAR

statement Example: VAR SAL; OUTPUT OUT = NEW MEAN = MEANSAL;

Computes the mean value of SAL, stores it in a new variable MEANSAL and writes them to the dataset NEW. Some of the statistics that can be computed using PROC MEANS are,

Keyword Description

MIN Minimum value

MAX Maximum value

MEAN Average

SUM Sum

Page 122 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 123: SAS Handout 1.0

Handout - SAS

Page 123 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Keyword Description

N Number of observations with non-missing values

NMISS Number of observations with missing values

STDDEV / STD Standard deviation

VAR Variance

RANGE Range

Sample Program & Output 1: PROC MEANS DATA = EMP; CLASS DEPTID; VAR SALARY; RUN;

Sample Program 2: PROC MEANS DATA = EMP SUM; CLASS DEPTID; VAR SALARY; RUN;

Page 124: SAS Handout 1.0

Handout - SAS

PROC SUMMARY

You can create a summarized output data set by using the SUMMARY procedure. PROC SUMMARY is similar to PROC MEANS in syntax and you can do all the analysis that can be done by PROC MEANS. The difference between the two procedures is that PROC MEANS produces a report by default, but PROC SUMMARY does not. By default, PROC SUMMARY creates only an output dataset.

PROC REPORT

PROC REPORT is another powerful display procedure that combines display and statistical analysis capabilities in one procedure.It produces a variety of reports using a single report-writing tool. It combines the features of

PROC PRINT PROC MEANS PROC SUMMARY PROC SORT PROC TABULATE

Why PROC REPORTS?

proc report requires less code and time is easy to learn and use easier to apply ODS style elements in proc report

Page 124 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 125: SAS Handout 1.0

Handout - SAS

Features of REPORT Procedure create listing reports create summary reports enhance reports request separate subtotals and grand totals

General Form: PROC REPORT DATA=SAS-data-set <options>; COLUMN column-specifications; DEFINE variable/ <usage> <attribute-list>; RUN;

Options:

WINDOWS | WD - invokes the procedure in an interactive REPORT window. This is the default option.

NOWINDOWS | NOWD – displays the report in the OUTPUT window. COLUMN column-specifications;

select and order the variables that appear in your list report This is similar to VAR statement in PROC PRINT It omitted, by default it takes all the variables. DEFINE variable / <usage> <attribute-list>;

The DEFINE statement is used to

Define how each variable is used in the report Assign formats and labels to variables Change the order of the values in the report

Usage:

DISPLAY: Displays values in column without ordering or grouping (just like proc print). ORDER: Sorts the report in ascending order, DESCENDING option also available GROUP: Groups observations into summarization lines. ANALYSIS: Returns the requested statistic.

Attribute-list:

FORMAT = <format name> - assigns a format to a variable ‘report-column-header’ - defines the column header (Label) for the column GROUP <variables> - produce summary reports DISPLAY & ORDER <variables> - produce listing reports

Page 125 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 126: SAS Handout 1.0

Handout - SAS

Example: DEFINE idptno / DISPLAY "Patient";

Prints the values of ‘idptno’. (like PROC PRINT) "Patient" is the LABEL for idptno.

DEFINE tx / ORDER "Treatment Group";

prints the values of ‘tx’ in ascending order DEFINE sal / ANALYSIS MEAN "Mean Severity” format=DOLLAR10.2;

finds the average salary and prints it in DOLLAR10.2 format Sample Program & Output 1: (Similar to PRINT) PROC REPORT DATA = EMP ; COLUMN EMPID NAME GENDER SALARY; DEFINE EMPID / ORDER 'Employee ID'; DEFINE NAME / DISPLAY 'Name of Employee'; DEFINE GENDER / DISPLAY ; DEFINE SALARY / DISPLAY 'Salary of Employee'; RUN;

Sample Program & Output 2: (Similar to MEANS) PROC REPORT DATA = EMP ; COLUMN DEPTID GENDER SALARY; DEFINE DEPTID / GROUP 'Dept ID'; DEFINE GENDER / GROUP 'Gender'; DEFINE SALARY / SUM 'Salary of Employee'; RUN;

Page 126 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 127: SAS Handout 1.0

Handout - SAS

Try It Out

Problem Statement 1

You have a SAS Dataset GRADES that has three fields CANDIDATE, EXAMINERA & EXAMINERB. Print the unique values of EXAMINERA along with its count. Do not include the missing values. Print the unique values of the combination of EXAMINERA & EXAMINERB along with its count as LIST frequency and BOX frequency. Include the Missing values in the report in the 2-way frequency. Contents of the Dataset GRADES: 1 1 2 2 0 0 3 0 0 4 2 2 5 0 0 6 4 3 7 0 0 8 0 0 9 0 0 10 2 3 11 1 2 12 2 . 13 0 1 14 4 3 15 4 3 16 1 2 17 0 . 18 1 2

Page 127 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 128: SAS Handout 1.0

Handout - SAS

19 2 3 20 0 0

Code

PROC FREQ DATA = GRADES; TABLES EXAMINERA / MISSING; TITLE3 ‘ ST FREQ OF EXAMINERA’; RUN; PROC FREQ DATA = GRADES; TABLES EXAMINERA * EXAMINERB / LIST MISSING; TITLE3 ‘2-WAY LIST FREQ OF EXAMINERA * EXAMINERB’; RUN; PROC FREQ DATA = GRADES; TABLES EXAMINERA * EXAMINERB / MISSING; TITLE3 ‘2-WAY BOX FREQ OF EXAMINERA * EXAMINERB’; RUN;

Refer File Name: 18.1.sas to obtain soft copy of the program code

How It Works

By default, PROC FREQ produces n-way frequency as BOX frequency. By including the option LIST, it generates a List frequency. MISSING option includes the missing values also in the report.

Problem Statement 2

Use the dataset BOTH created in Problem 1 and compute the mean IQ and GPA for each value of GENDER. Do this for all the data and then for employees born before January 1, 1972.

Code

PROC MEANS N MEAN DATA=BOTH; CLASS GENDER; VAR IQ GPA; RUN; PROC MEANS N MEAN DATA=BOTH; WHERE DOB LT '01JAN72'D and DOB IS NOT MISSING; CLASS GENDER; VAR IQ GPA; RUN;

Refer File Name: 18.2.sas to obtain soft copy of the program code

Page 128 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 129: SAS Handout 1.0

Handout - SAS

How It Works

Here the grouping variable is GENDER and the analysis variable is IQ and GPA. N option prints the no of observations in each group MEAN options prints the Average value of each group. '01JAN72'D is the date constant. WHERE statement prints only the observations whose DOB value is less than

01JAN72 and is not missing.

Problem Statement 3

Generate the report mentioned in problem 18.2 using REPORT procedure.

Code

PROC REPORT DATA=BOTH; COLUMN GENDER IQ GPA N; DEFINE GENDER / GROUP; DEFINE IQ /ANALYSIS MEAN; DEFINE GPA / ANALYSIS MEAN; RUN; PROC REPORT DATA=BOTH; COLUMN GENDER IQ GPA N; DEFINE GENDER / GROUP; DEFINE IQ /ANALYSIS MEAN; DEFINE GPA / ANALYSIS MEAN; WHERE DOB LT '01JAN72'D and DOB IS NOT MISSING; RUN;

Refer File Name: 18.3.sas to obtain soft copy of the program code

How It Works

Here the grouping variable is GENDER and the analysis variable is IQ and GPA. MEAN options prints the Average value of each group. '01JAN72'D is the date constant. WHERE statement prints only the observations whose DOB value is less than

01JAN72 and is not missing. N prints the number of observations

Problem Statement 4

Export the output of problem 18.3 to a RTF file.

Code

ODS RTF FILE = ‘C:\SASFILES\REPORT.RTF’; PROC REPORT DATA=BOTH;

Page 129 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 130: SAS Handout 1.0

Handout - SAS

COLUMN GENDER IQ GPA N; DEFINE GENDER / GROUP; DEFINE IQ /ANALYSIS MEAN; DEFINE GPA / ANALYSIS MEAN; RUN; PROC REPORT DATA=BOTH; COLUMN GENDER IQ GPA N; DEFINE GENDER / GROUP; DEFINE IQ /ANALYSIS MEAN; DEFINE GPA / ANALYSIS MEAN; WHERE DOB LT '01JAN72'D and DOB IS NOT MISSING; RUN; ODS RTF CLOSE;

Refer File Name: 18.4.sas to obtain soft copy of the program code

Summary

The FREQ procedure is a descriptive procedure as well as a statistical procedure that produces one way and n-way frequency tables.

To produce cross-tabulation report on one or more variables, use asterisk (*) between the variables in the TABLES statement

Multi-threaded processing is a type of parallel processing introduced in SAS System 9 Threaded processing can be controlled via the SAS system option THREADS |

NOTHREADS. The MEANS procedure displays simple descriptive statistics PROC SUMMARY is similar to PROC MEANS PROC REPORT is another very powerful display procedure that combines display and

statistical analysis capabilities in one procedure. It produces a variety of reports using a single report-writing tool

Test your Understanding

PROC FREQ: Code the tables statement for a single-level frequency Code the tables statement for a multi-level frequency Name the option to produce a frequency line items rather than a table. Name the option that allows to include missing numeric data to be included in the

report PROC MEANS:

Code a PROC MEANS that shows both summed and averaged output of the data What is the differences between PROC SUMMARY and PROC MEANS?

Page 130 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 131: SAS Handout 1.0

Handout - SAS

Session 20: PROC SQL

Learning Objectives

After completing this session, you will be able to: Work with PROC SQL procedure Explain SELECT statement and its clauses Create output tables Summarize data Group data Query multiple tables Limit no of rows to be read and displayed Use Operators in PROC SQL Calculate values Enhance query output

PROC SQL Basics

PROC SQL is a powerful SAS Procedure that combines the functionalities of DATA and PROC steps in a single tool. PROC SQL can sort, summarize, subset, join (merge), and concatenate datasets, create new variables, and print the results or create a new table or view all in one step. The SQL procedure provides an easy, flexible way to query and combine your data. PROC SQL is SAS' implementation of Structured Query Language (SQL), which is similar to ANSI SQL. Most of the statements and options in PROC SQL have the same syntax as their ANSI SQL’s counterparts. PROC SQL can often be used as an alternative to other SAS procedures or the DATA step. You can use PROC SQL to

Retrieve data and manipulate SAS tables Add or modify data values in a table Add, modify, or drop columns in a table Create tables and views Join multiple tables Generate reports

Note: In this section SAS Datasets are named as tables. The Difference: PROC SQL differs from most other SAS procedures in several ways. Unlike other PROC statements, many statements in PROC SQL are composed of clauses.

Page 131 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 132: SAS Handout 1.0

Handout - SAS

For example, the following PROC SQL step contains two statements: the PROC SQL and the SELECT statement. The SELECT statement contains several clauses: SELECT, FROM, and WHERE. proc sql; select empid, jobcode, salary, salary*.06 as bonus from sasuser.payrollmaster where salary<32000 ;

The PROC SQL step does not require a RUN statement. It executes each query automatically. It ends with a QUIT statement. The variables, datasets in the queries are separated by comma and not by spaces like other SAS statements.

The SELECT Statement and its Clauses

The SELECT statement, which follows the PROC SQL statement, retrieves and displays data. It is composed of clauses, each of which begins with a keyword and is followed by one or more components. General Form: PROC SQL options; SELECT column(s) FROM table-name | view-name WHERE expression GROUP BY column(s) HAVING expression ORDER BY column(s); QUIT;

A SIMPLE PROC SQL: Example: PROC SQL; SELECT * FROM USSALES; QUIT;

It prints the contents of the dataset USSALES and the output will be similar to that of PROC PRINT. An asterisk on the SELECT statement selects all columns from the data set. If you want to print only specific fields in the report, list and separate the variables in the SELECT statement.

Page 132 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 133: SAS Handout 1.0

Handout - SAS

Example: PROC SQL NUMBER; SELECT STATE, SALES FROM USSALES; QUIT;

To subset data based on a condition, use a WHERE clause in the SELECT statement. To sort rows by the values of specific columns, you can use the ORDER BY clause. CREATING NEW VARIABLES: Variables can be dynamically created in PROC SQL using the keyword ‘AS’. Any of the DATA step functions can be used in an expression, to create a new variable Example: PROC SQL; SELECT SUBSTR(STORE,1,3) AS STORENO, SALES, (SALES * .05) AS TAX, (SALES * .05) * .01 FROM USSALES; QUIT;

There can be any number of SQL statements in a PROC SQL procedure.

Creating Output Tables

To create a new table from the results of a query, use a CREATE TABLE statement that includes the keyword AS and the clauses that are used in a PROC SQL query: General Form: PROC SQL; CREATE TABLE table-name AS SELECT statement…….. ; Quit;

Example: The following query creates a table named NEW that is similar to EMP. It will not print anything in the Output window. Example: PROC SQL; CREATE TABLE NEW AS SELECT * FROM EMP; QUIT;

Page 133 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 134: SAS Handout 1.0

Handout - SAS

Summarizing & Grouping Data

To group data for summarizing, you can use the GROUP BY clause. The GROUP BY clause is used in queries that include one or more summary functions. Summary functions produce a statistical summary for each group that is defined in the GROUP BY clause. Example: Suppose you want to determine the total number of miles traveled by frequent-flyer program members in each of three membership classes (Gold, Silver, and Bronze). Example: proc sql; select membertype, sum(milestraveled) as TotalMiles from sasuser.frequentflyers group by membertype; Quit;

Here, the SUM function adds the values of the MilesTraveled column to create the TotalMiles column. The GROUP BY clause groups the data by the values of MemberType. The results show total miles by membership class (MemberType). You can use most of the SAS functions in the SQL statements:

Querying Multiple Tables

A join is used to combine information from multiple files. One advantage of using PROC SQL to join files is that, it does not require sorting the datasets Example: PROC SQL; SELECT * FROM JANSALES, FEBSALES; QUIT;

Here a Cartesian Join combines all rows from one file with all rows from another file. INNER JOIN: An Inner Join combines datasets only if an observation is in both the datasets. This type of join is similar to a DATA step merge using the IN Data Set Option and IF logic

Page 134 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 135: SAS Handout 1.0

Handout - SAS

Example: PROC SQL; SELECT U.STORENO, U.STATE, F.SALES AS FEBSALES FROM USSALES U, FEBSALES F WHERE U.STORENO=F.STORENO; QUIT;

Limiting no of rows to be read and displayed

OUTOBS= option

To indicate the maximum number of rows to be displayed, you can use the OUTOBS= option in the PROC SQL statement. OUTOBS= is similar to the OBS= data set option. General Form: PROC SQL OUTOBS=n;

The OUTOBS= option restricts the rows that are displayed, but not the rows that are read.

INOS= option

The INOBS= option restricts the number of rows that PROC SQL takes as input from any single source. General Form: PROC SQL INOBS=n;

Example Program: proc sql inobs=5; select * from work.all quit;

Since we are limiting the input records to 5, SAS will print the a similar information in the Log file Log File: WARNING: Only 5 records were read from WORK.ALL due to INOBS= option.

Using Operators in PROC SQL

Comparison, logical, and concatenation operators are used in PROC SQL in the WHERE clause as they are used in other SAS procedures:

Page 135 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 136: SAS Handout 1.0

Handout - SAS

For example, the following WHERE clause contains the logical operator AND, which joins multiple conditions and two comparison operators: an equal sign (=) and a greater than symbol (>). Example: proc sql; select ffid, name, state, pointsused from sasuser.frequentflyers where membertype = 'GOLD' AND pointsused > 0 order by pointsused;

You can also use the following conditional operators. All of these operators can also be used in other SAS procedures.

Calculated Values

The following PROC SQL query creates the new column Total by adding the values of three existing columns: Boarded, Transferred, and Nonrevenue Example: select flightnumber, date, destination, boarded + transferred + nonrevenue as Total from sasuser.marchflights where total < 100;

Page 136 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 137: SAS Handout 1.0

Handout - SAS

If you use the newly created field Total in the where clause, SAS throws an error message. Log file: from sasuser.marchflights where total < 100; ERROR: The following columns were not found in the contributing tables: total

This error message is generated because, in SQL queries, the WHERE clause is processed prior to the SELECT clause Using the Keyword CALCULATED: When you use a column alias in the WHERE clause to refer to a calculated value, you must use the keyword CALCULATED along with the alias. The CALCULATED keyword informs PROC SQL that the value is calculated within the query. Example: select flightnumber, date, destination, boarded + transferred + nonrevenue as Total from sasuser.marchflights where calculated total < 100;

This query executes successfully and produces the following output.

Enhancing Query Output

By default, the output of PROC SQL is not formatted. But you can improve the appearance of your query output by using

Column labels and formats Titles and footnotes Columns that contain a character constant.

Page 137 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 138: SAS Handout 1.0

Handout - SAS

To control the formatting of columns in output, you can specify SAS data set options, such as LABEL= and FORMAT=, after any column name specified in the SELECT clause

Note: The data set options LABEL= and FORMAT= are not part of the ANSI standard. These options are SAS enhancements. Example: proc sql outobs=5; title 'Current Bonus Information'; title2 'Employees with Salaries > $75,000'; select empid label='Employee ID', jobcode label='Job Code', salary, salary * .10 as Bonus format=dollar12.2 from sasuser.payrollmaster where salary>75000 order by salary desc

The first two columns have new labels, the Bonus values are consistently formatted, and two title lines are displayed at the top of the output.

Page 138 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 139: SAS Handout 1.0

Handout - SAS

CONCLUSION

PROC SQL is a powerful tool. It can make your life much easier. For beginner SQL users, remember the following points:

Be careful about many to many table joins in SQL. When joining tables that have multiple records per matching ids, the output table may be a Cartesian product. For example, 3 rows joining 5 rows of same id variable will produce 15 rows, as compared to the DATA Step MERGE where only 5 rows will be created.

PROC SQL is code-saving, but not always time-saving.

Try It Out

Problem Statement 1

Consider you have a dataset EMP. Print the contents of the dataset in sorted order by the value of jobcode. Filter the observations with the values of salary<32000 Create a new column BONUS and its value is 6% of the Salary value.

EmpID JobCode Salary

1970 FA1 $31,661

1422 FA1 $31,436

1658 SCP $25,120

1113 FA1 $31,314

1094 FA1 $31,175

1789 SCP $25,656

1422 FA1 $31,436

1564 SCP $26,366

1354 SCP $25,669

1094 FA1 $31,175

1101 SCP $26,212

Code

proc sql; select empid, jobcode, salary, salary*.06 as bonus from emp where salary<32000 order by jobcode; Quit;

Refer File Name: 20.1.sas to obtain soft copy of the program code

Page 139 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 140: SAS Handout 1.0

Handout - SAS

How It Works

Order by clause – sorts the dataset Where clause – filters the observations salary*.06 as bonus is the new column

Problem Statement 2

With the dataset EMP, Determine the total salary for each jobcode. Apply a format to the new column. Limit the output observations to 10

Code

proc sql outobs = 10; select jobcode, sum(salary) as Totsal format = dollar13.2, from emp group by jobcode; Quit;

Refer File Name: 20.2.sas to obtain soft copy of the program code

How It Works

Group by clause Summarizes the observations by jobcode SUM function finds the sum of Salary for each group. Outobs is similar to OBS option and it prints only the specified number of observations

Summary

PROC SQL is a powerful SAS Procedure that combines the functionality of DATA and PROC steps into a single step

PROC SQL is SAS' implementation of Structured Query Language (SQL), which is similar to ANSI SQL.

Is composed of clauses To group data for summarizing, you can use the GROUP BY clause. A join is used to combine information from multiple files. To indicate the maximum number of rows to be displayed, you can use the OUTOBS=

option in the PROC SQL statement. Comparison, logical, and concatenation operators are used in PROC SQL in the

WHERE clause as they are used in other SAS procedures When you use a column alias in the WHERE clause to refer to a calculated value, you

must use the keyword CALCULATED along with the alias. You can improve the appearance of your query output by using column labels and formats titles and footnotes columns that contain a character constant

Page 140 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 141: SAS Handout 1.0

Handout - SAS

Test your Understanding

1. What is the use of Proc SQl? 2. What is the use of the keyword CALCULATED? 3. How will you limit the number of observations read and written in PROC SQL?

Page 141 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 142: SAS Handout 1.0

Handout - SAS

Session 22: Introduction to MACROS

Learning Objectives

After completing this session, you will be able to: Explain SAS Macro List the advantages of the SAS Macro Facility Work with Macro variables Describe Automatic and User defined macro variables Explain Macro triggers Explain Macro Processor and the flow of execution Create macro variables in run time

SAS Macro

The macro facility is one of the most powerful features of BASE SAS. SAS macros enable you to substitute text in your SAS programs. When you reference a macro, SAS replaces the reference with the text value that has been assigned to that macro. This makes your programs more reusable and dynamic. In simple terms, the SAS macro facility is a tool for text substitution. Macros allow users to:

Write more flexible code Pass information between data or proc steps Generate SAS statements based on the data.

There are two main components of the SAS macro facility:

Macro variables Macro programs

Macro variables are like parameters passed on to a SAS program. Macro programs use macro variables and macro programming statements to build SAS programs.

Advantages of the SAS Macro Facility

Macros can help in several ways With macros you can make one small change in your program and have SAS echo

that change throughout your program. Macros can allow you to write a piece of code and use it over and over again in the

same program or in different programs. You can make your programs data driven, letting SAS decide what to do based on

actual data values

Page 142 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 143: SAS Handout 1.0

Handout - SAS

Using the SAS Macro facility, SAS programs can become reusable, shorter, and easier to follow. accomplish repetitive tasks quickly and efficiently Without changing the code, we can customize the results by passing parameters to

the macro program conditionally execute SAS code perform repetitive tasks Debugging is easier Automatically insert the date and other session information into your code Write more flexible code, and pass data between DATA/PROC steps during execution

time

Macro variables

Macro variables belong to the SAS macro language and are different from Data step variables. You can define and use macro variables anywhere in a SAS program, except in DATALINES or CARDS. The %LET statement enables you to define a macro variable and to assign a value to it. General form: %LET variable = value;

Where,

variable is any name that follows the SAS naming convention. value can be any string from 0 to 65,534 characters. if either variable or value contains a reference to another macro variable (such as

&macvar), the reference is evaluated before the assignment is made. If variable already exists, value replaces the current value.

Rules for creating Macro variables:

All values are stored as character strings. Mathematical expressions are not evaluated. The case of the value is preserved. Quotation marks that enclose literals are stored as part of the value. Leading and trailing blanks are removed from the value before the assignment is

made. You can reference a macro variable by preceding it with an ampersand (&).

Note: The macro processor resolves references in double quotes but not in single quotes.

%LET Statement Variable Name Variable Value Length

%let name= Ed Norton ; name Ed Norton 9

%let name2=' Ed Norton '; name2 ' Ed Norton ' 13

%let title="Joan's Report"; title "Joan's Report" 15

Page 143 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 144: SAS Handout 1.0

Handout - SAS

Page 144 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

%LET Statement Variable Name Variable Value Length

%let start=; start 0

%let sum=4+3; sum 4+3 3

%let total=0+&sum; total 0+4+3 5

%let x=varlist; x varlist 7

%let &x=name age height; varlist name age height 15

DATA Step Variable Vs Macro Variable: The following table illustrates the difference between DATA Step Variable and Macro Variable

DATA Step Variable Macro Variable

DATA step variable belongs to the SAS language

Macro variables belong to the SAS macro language

Its value depends on the observation being processed.

Contains one value that remains constant until explicitly changed.

Is part to the SAS Dataset Is independent of the SAS Data set

Where we can use Macro Variable: In your SAS programs, you might find that you need to reference the same text string multiple times. Example: DATA sales; Set DEPT; where Dept = “sales”; run; proc print data = sales; title “List of employees in sales department”; run;

Then, you might need to change the references in your program in order to reference a different text string. If your programs are lengthy, updating them manually can take a lot of time, also chances of manual errors are more. If you use a macro variable in your program, you only need to make the change in one place and SAS will echo its value in all the places where it is referenced. Example: %let dept = sales; DATA &dept; Set ALL_DEPT; where Dept = “&dept”; run; proc print data = &dept; title “List of employees in &dept department”; run;

Page 145: SAS Handout 1.0

Handout - SAS

Automatic and User defined macro variables

There are two types of macro variables: Automatic macro variables User-defined macro variables

Both types of macro variables are independent of the SAS dataset. Automatic macro variables: SAS creates and defines several automatic macro variables. Automatic macro variables contain information about your computing environment and the date and time of the session. They are created when SAS is invoked and are with global scope. Usually its value is assigned by SAS. Some of the automatic macro variables are given below.

Name Value

SYSDATE the date of the SAS invocation (DATE7.)

SYSDATE9 the date of the SAS invocation (DATE9.)

SYSDAY the day of the week of the SAS invocation

SYSTIME the time of the SAS invocation

SYSVER the release of SAS that is being used

SYSLAST the name of the most recently created SAS data set.

User-defined macro variables: The macro variables created by the user are user-defined macro variables. Example: you can create a user-defined macro variable with %LET statement and CALL SYMPUT routine.

Macro Processor and the flow of execution

SAS Program flow of execution (without Macros):

When you submit a program, it goes to an area of memory called the input stack.

Page 145 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 146: SAS Handout 1.0

Handout - SAS

Word scanner reads the program from Input Stack and divides program text into fundamental units called tokens o Tokens are passed on demand to the compiler. o The compiler requests tokens until it receives a semicolon. o SAS stops sending statements to the compiler when it reaches a step boundary.

e.g. RUN statement Compiler checks the syntax of tokens received from the word scanner. After it

completes checking the syntax, the code is sent for execution. Executor executes the code and prints the result to the Log and the Output files.

Terms used in Macro Processing:

Term Description

input stack Holds a SAS program after it is submitted.

word scanner Scans the text it takes from the input stack and breaks the text into tokens. Determines the destination of the token: DATA step compiler, macro processor, etc.

token Fundamental unit in the SAS language. Tokens are the actual keywords in the SAS statements as well as the literal strings, numbers, and symbols. Ex: DATA, 1234, +, - , =, variable

compiler Checks the syntax of tokens received from the word scanner. After it completes checking the syntax, the code is sent for execution.

macro processor Processes macro language references and statements.

macro trigger

The symbols & and %, when followed by a letter or underscore, that signal the word scanner to transfer the current statement to the macro processor.

Macro Facility: The macro facility includes a macro processor that is responsible for handling all macro language elements. When a macro trigger is detected, the word scanner passes it to the macro processor for evaluation. The Compiler does not recognize the macro statements. Macro Trigger: The word scanner recognizes the following token sequences as macro triggers:

% followed immediately by a name token (such as %let) & followed immediately by a name token (such as &dept).

Page 146 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 147: SAS Handout 1.0

Handout - SAS

SAS Program flow of execution with Macro statements:

SAS Program flow of execution with Macro statements:

When the Word Scanner encounters a macro trigger it sends the statement to the Macro Processor.

Macro Processor processes macro language references and macro statements and returns the SAS codes (without macro statements)

The resolved SAS codes are returned to the Input Stack and the execution continues. Combining Macro variable reference with text (Concatenation): When you place a macro variable reference adjacent to text, then SAS interprets the entire text as a macro variable. Example: %let month = APR PROC PRINT DATA = WORK.&MONTHDATA; RUN;

Here SAS interprets &MONTHDATA as a macro variable and throws a warning message, stating It cannot load the macro variables &MONTHDATA. To avoid this, use a period (.) at the end of the macro variable reference. PROC PRINT DATA = WORK.&MONTH.DATA;

Now &MONTH. resolves to ‘Apr’ and the dataset name becomes WORK.APRDATA

Creating macro variables in run time

Consider the values of A & B are as follows in the below program Example: A = 2000; B = 1000; if A > B then do;

Page 147 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 148: SAS Handout 1.0

Handout - SAS

%let txt=A is greater than B; end; else do; %let txt= A is lesser than B; end;

Any guesses what will be the value of the macro variable txt. A is greater than B? No, it is not, the value is ‘A is lesser than B’. This is because the macro facility performs its task before SAS program executes, but SAS assigns the values of A and B only during the execution time. So the condition will not be evaluated and both the %let statements are sent to the macro processor. The macro processor first executes %let txt=A is greater than B; Then the next statement is executed %let txt= A is lesser than B; The latest value ‘A is lesser than B’ is assigned to the macro variable ‘txt’. So we cannot use or assign SAS variable values with the macro variables. The SYMPUT Routine: The DATA step provides functions and CALL routines that enable you to transfer information between an executing DATA step and the macro processor. SYMPUT routine creates a macro variable during execution time and assigns a value. General form: CALL SYMPUT (‘macro-variable’, ‘text’);

If quotes are not used it is considered as a variable & its value is substituted in its place. CALL SYMPUT ('macro-variable', DATA-step-variable);

This form of the SYMPUT routine creates the macro variable named macro-variable and assigns to it the current value of DATA-step-variable. When you use a DATA step variable as the second argument, a maximum of 32767 characters can be assigned to the receiving macro variable. Any leading or trailing blanks that are part of the DATA step variable's value are stored in the macro variable. Caution:

When you use the SYMPUT routine to create a macro variable in a DATA step, the macro variable is actually created only at the end of the DATA step execution.

Therefore, you cannot reference a macro variable within the same DATA step where it is created.

Page 148 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 149: SAS Handout 1.0

Handout - SAS

Page 149 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

The SYMGET Function:

To obtain a macro variable's value during DATA step execution, use the SYMGET function.

The SYMGET function returns the value of an existing macro variable. General form: SYMGET (‘macro-variable’)

Where macro-variable is the name of an existing macro variable. If quotes are not used it is considered as a variable & its value is substituted in its place.

Try It Out

Problem Statement 1

A company that manufactures bicycles maintains a dataset ‘Models’ listing all their models. For each model they record its name, class (Road, Track, or Mountain), list price, and frame material. Here is a subset of the data:

Create a macro variable ‘bikeclass’ and assign a value to it and print only those observations with the value of the macro variable. Also use a TITLE statement to display the value of the macro variable.

Code

%LET bikeclass = Mountain; * Use a macro variable to subset; PROC PRINT DATA = models; WHERE Class = "&bikeclass"; TITLE "Current Models of &bikeclass Bicycles"; RUN;

Page 150: SAS Handout 1.0

Handout - SAS

Page 150 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Refer File Name: 22.1.sas to obtain soft copy of the program code

How It Works

u are using macro variables, use only double-quotes and not single-quotes.

Problem Statement 2

taset with information about every order they receive. ed, model name,

ere is the data: elta Breeze 15

15

20

1 1

30 25

very Monday the president of the company wants a detail-level report showing all the current

rite a SAS program for the above requirement.

Code

If yo

A company maintains a daFor each order, the data include the customer ID number, date the order was placand quantity ordered. H287 15OCT03 D287 15OCT03 Santa Ana 274 16OCT03 Jet Stream 1174 17OCT03 Santa Ana 174 17OCT03 Nor'easter 5174 17OCT03 Scirocco 347 18OCT03 Mistral 287 21OCT03 Delta Breeze287 21OCT03 Santa Ana Eorders. On Friday the president wants a report summarized by customer. W

%MACRO reports; %IF &SYSDAY = Monday %THEN %DO; PROC PRINT DATA = orders; FORMAT OrderDate DATE7.; TITLE "&SYSDAY Report: Current Orders"; %END; %ELSE %IF &SYSDAY = Friday %THEN %DO; PROC MEANS DATA = orders; CLASS CustomerID; VAR Quantity; TITLE "&SYSDAY Report: Summary of Orders"; %END; RUN; %MEND reports;

Refer File Name: 22.2.sas to obtain soft copy of the program code

Page 151: SAS Handout 1.0

Handout - SAS

How It Works

SYSDAY has the value of System Day. IF the day is Monday PROC PRINT code is returned. IF it is Friday PROC MEANS code is returned.

Summary

The macro facility is one of the most powerful features of base SAS. There are two main components of the SAS macro facility o Macro variables o Macro programs

The %LET statement enables you to define a macro variable and to assign a value to it

There are two types of macro variables: o automatic macro variables o user-defined macro variables

% and & are considered as macro triggers SYMPUT routine to creates a macro variable during execution tine and assign a value The SYMGET function returns the value of an existing macro variable

Test your Understanding

1. What are macro triggers? 2. What is the scope of the Macro Variables A and B?

%let A = 10; %macro abc; %let B = 20; %put _user_; %mend abc;

3. What is the value of the macro variables Total and Sum? %let Total = “3+6”; %let Sum = 3+6;

4. What are all the debugging options in Macros? 5. Which of the following TITLE statements correctly references the macro variable month?

a) title "Total Sales for ‘&month’ "; b) title ‘Total Sales for &month’; c) title "Total Sales for &month"; d) title Total Sales for "&month";

6. How would you include common or reuse code to be processed along with your statements?

7. How do you identify a macro variable? 8. For what purposes do you use SAS macros? 9. Tell about call symput? 10. What are SYMGET and SYMPUT? 11. Describe how would you create a macro variable during compile time & run time.

Page 151 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 152: SAS Handout 1.0

Handout - SAS

Session 23: Introduction to MACROS

Learning Objectives

After completing this session, you will be able to: Macro Programs Using Parameters to macro programs Scope of Macro variables Macro System Options Condition execution in Macro Iterative processing in Macro Built-in Macro Functions

Macro Programs

A macro is a group of SAS statements that is identified by a name. It is a larger piece of a program that can contain complex logic including complete DATA and PROC steps, macro statements and macro variables. General form of a macro: %MACRO macro-name; macro-text %MEND macro-name;

Starts with %MACRO statement followed by a macro name. Ends with %MEND The macro name can also appear after %MEND for clarity, but it is optional. Macro-text represents the SAS statements that you include in your macro. To invoke a macro, place a % in front its name, as:

%macro-name

Example: %MACRO printit; PROC PRINT DATA = EMP (OBS = 10); TITLE ‘CONTENTS OF DATASET EMP’; RUN; %MEND printit; %printit PROC SORT DATA = EMP; BY EMPID; RUN; %printit

Page 152 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 153: SAS Handout 1.0

Handout - SAS

The program calls the macro twice; first without sorting the data, and then after executing a PROC SORT by Empid. The SAS statements inside the macro is substituted in the place ‘%printit’ Macro variables Vs Macros:

Macro Variables Macros

Starts with an ampersand (&) Starts with a percent sign (%)

Defined using %LET statement Defined using %MACRO and %MEND statements

Is like a standard data variable except that it does not belong to a data set and has only a single value which is always character

Is a larger piece of a program that can contain complex logic including complete DATA and PROC steps, macro statements and macro variables

Using Macro Parameters

The next step is to introduce parameters to the macro programs, which will make them more flexible and creates data driven programs. Parameters are values that are passed to the macro at the time of invocation. They are defined in a set of parentheses following the macro name. Parameters to macro programs are macro variables, so when referring inside the definition they need preceding ampersands. There are two styles for coding the parameters:

Positional Keyword

Positional Parameter

The following example shows the positional style: Example: %MACRO printit(dsname, noobs); PROC PRINT DATA = &dsname (OBS = &noobs); TITLE ‘CONTENTS OF DATASET &dsname’; RUN; %MEND printit;

To invoke the macro use the following syntax with the parameters substituted in the right position %printit(emp, 100)

Keyword Parameter

The following example shows the keyword style: Example: %MACRO printit(dsname = &syslast, noobs = 100); PROC PRINT DATA = &dsname (OBS = &noobs); TITLE ‘CONTENTS OF DATASET &dsname’; RUN; %MEND;

Page 153 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 154: SAS Handout 1.0

Handout - SAS

Where, &syslast and 100 are the default arguments for dsname and noobs respectively. &syslast – refers to the most recently created dataset.

The above macro can be invoked in different ways. %printit(dsname = dept, noobs = 50)

Takes the parameter values as ‘dept’ for dsname and ‘50’ for noobs. %printit(noobs = 50)

Since the value of dsname is not provided, it takes the default parameter &syslast. %printit()

Takes the default arguments for both the parameters. In Positional style, the parameters should be given in the same order as in the macro definition. But in Keyword style, the parameters can be given in the any order.

Scope of Macro variables

Scope of Macro variables: Macro variables come in two varieties:

LOCAL GLOBAL

LOCAL Macro Variable

A macro variable’s scope is LOCAL, if it is defined inside a macro. Example: %MACRO TEST; %LET A = HAI; <Macro statements>; %MEND;

When a %LET statement is found within a %MACRO definition then the variable is LOCAL to that macro and is not available outside of that macro.

GLOBAL Macro Variable

A macro variable’s scope is GLOBAL, if it is defined in “open code” which is everything outside a macro. You can reference a global macro variable anywhere in your program. Example: %LET A = HAI;

Page 154 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 155: SAS Handout 1.0

Handout - SAS

When the %LET statements are placed in open code (outside of any DATA step or %MACRO definition) the variables they define are with GLOBAL scope To create a macro variable with GLOBAL scope inside a macro, use the %GLOBAL statement. Example: %GLOBAL A;

Place this statement above the %LET statement.

System Options

The SYMBOLGEN Option

When a macro variable is referenced, the macro processor resolves the reference and passes the value directly back to the input stack. Therefore, we cannot see the value of macro variables returned by the macro processor. To debug the programs, it might be useful to view the value of the macro variables. SYMBOLGEN system option is used to print the value of the macro variables. General form: OPTIONS NOSYMBOLGEN | SYMBOLGEN;

Where,

NOSYMBOLGEN specifies that log messages about macro variable references will not be displayed. This is the default.

SYMBOLGEN specifies that log messages about macro variable references will be displayed in the log file

Example Program: set sasuser.all; where fee>&amount; A = “&city”;

SAS Log

110 where fee>&amount; SYMBOLGEN: Macro variable AMOUNT resolves to 975 111 A = “&city”; SYMBOLGEN: Macro variable CITY resolves to Dallas

Page 155 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 156: SAS Handout 1.0

Handout - SAS

MPRINT

When the MPRINT option is specified, the text that is sent to the SAS compiler as a result of macro program execution is printed in the SAS log. General form: OPTIONS MPRINT | NOMPRINT; Where,

NOMPRINT – Turns off the option. This is the default. MPRINT – Turns on the option

Example: Consider you want to call the macro printit and use the MPRINT system option. Macro Definition: %MACRO printit(); PROC PRINT DATA = DEPT (OBS = 75); TITLE ‘CONTENTS OF DATASET DEPT ’; RUN; %MEND printit; OPTIONS MPRINT; %printit()

Log FILE: 101 %printit MPRINT(PRINTIT): proc print data= DEPT (obs=75); MPRINT(PRINTIT): title " CONTENTS OF DATASET DEPT"; MPRINT(PRINTIT): run;

MLOGIC

The MLOGIC option prints messages that indicate macro actions that were taken during macro execution General form: OPTIONS MLOGIC | NOMLOGIC; Where,

MLOGIC specifies that messages about macro actions are printed to the log during macro execution.

NOMLOGIC is the default setting, and specifies that messages are not printed to the SAS log

Page 156 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 157: SAS Handout 1.0

Handout - SAS

Example: options mlogic; %printit ()

Log file: 107 %printit MLOGIC(PRINTIT): Beginning execution. NOTE: There were 1 observations read from the dataset WORK.EMP. NOTE: PROCEDURE PRINT used: real time 0.02 seconds cpu time 0.02 seconds MLOGIC(PRINTIT): Ending execution

All the options SYMBOLGEN, MPRINT and MLOGIC options are typically turned on for development and debugging purposes. Turned off when the application is in production mode.

%PUT statement

Another way of verifying the values of macro variables. The %PUT statement writes text and values of macro variables to the SAS log. General form: %PUT text; Where,

text is any text string or macro variable. It may be used virtually anywhere in the program and it will write to the SAS Log, the

values of user defined or system defined macro variables To print the values of macro variables using %PUT statement use

Argument Result in SAS Log

_ALL_ Lists the values of all macro variables

_AUTOMATIC_ Lists the values of all automatic macro variables

_USER_ Lists the values of all user-defined macro variables

Option Description

SYMBOLGEN Writes a message for the resolution of each macro variable

MPRINT Displays the SAS statements returned by the Macro Processor

MLOGIC Traces the beginning/ending of macro execution and any parameter values assigned

%PUT Prints the values of the macro variables and text specified

Page 157 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 158: SAS Handout 1.0

Handout - SAS

Condition execution in Macro

You can use macros to control conditional execution of statements. Here are the general forms of statements used for conditional logic in macros: %IF condition %THEN action; %ELSE %IF condition %THEN action; %ELSE action; %IF condition %THEN %DO; action; %END;

These statements are similar to the standard SAS IF statement. Each keyword starts with a % sign to differentiate it from the standard IF statement. These statements can only be used inside a macro. The conditions and actions can include other macro statements or even complete DATA and PROC steps. If there is multiple statements in an action block, use the %DO-%END block. %IF Vs IF statement: The following table lists the difference between Macro IF statement and standard IF statement.

Macro %IF-%THEN-%ELSE statement Standard IF-THEN-ELSE statement

is used only in a macro program is used only in a DATA step program

executes during macro execution executes during DATA step execution

uses only macro variables in logical expressions and cannot refer to DATA step variables

uses DATA step variables & macro variables in logical expressions

Determines the text/SAS statements to be copied to the input stack.

Determines the DATA step statement(s) to be executed.

In the below example, IF the parameter is used as ‘PRINT’ then, PROC PRINT code is substituted in the place of macro invocation (%reportit) ElSE PROC CONTENTS code is substituted.

Example: %MACRO reportit(request); %IF &request = PRINT %THEN %DO; PROC PRINT DATA = EMP; RUN; %END; %ELSE %DO

Page 158 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 159: SAS Handout 1.0

Handout - SAS

PROC CONTENTS DATA = EMP; RUN; %END; %MEND; %reportit

Iterative processing in Macro

We can also use the Iterative processing in Macros using iterative %DO statements. These statements are similar to the standard SAS statements and are used to repeat a set of SAS statements specific number of times. The %DO statement has various forms

%DO-%WHILE %DO-%UNTIL Iterative %DO

Example: %MACRO arrayme; %DO i = 1 %to 5; file&i %END; %MEND arrayme; DATA one; SET %arrayme; RUN;

The macro evaluates to the following during execution time: Example: DATA one; SET file1 file2 file3 file4 file5; RUN;

This macro generated a list of 5 dataset names. The values of 1 to 5 are substituted in the expression file&i to produce (file1 file2 file3 file4 file5). The above code will write five datasets into dataset one.

Built-in Macro Functions

Macro character functions have the same basic syntax as the corresponding DATA step functions and they yield similar results. Although they might be similar, macro character functions are different from DATA step functions. Macro functions work with Macro variables whereas, Data step functions work with Data step variables. Let us discuss about some of the basic macro functions.

Page 159 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 160: SAS Handout 1.0

Handout - SAS

The %UPCASE Function

The %UPCASE function enables you to change the value of a macro variable from lowercase to uppercase. General form: %UPCASE ( argument );

Where, argument is a character string. Example: %LET NAME = raju; %LET NAME = %UPCASE(&NAME);

Now the value of the macro variable NAME is ‘RAJU’

The %SUBSTR Function

The %SUBSTR function enables you to extract part of a character string from the value of a macro variable. General form: %SUBSTR ( argument, position <,n> )

Where,

argument is a character string or a text expression position specifies the position of the first character in the substring. n - Specifies the number of characters in the substring.

Example: %let date = 05JAN2002; %substr(&date,3,7) will return the value JAN2002. %substr(&date,3,3) will return the value JAN.;

%LENGTH statement

Returns the length of the string. Example: %LENGTH(&date) returns 9

The %SYSFUNC Function

You can use the %SYSFUNC function to execute other DATA step functions as part of the macro facility.

Page 160 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 161: SAS Handout 1.0

Handout - SAS

General form: %SYSFUNC ( function ( argument(s) ) <,format> )

Where,

function is the name of the SAS function to execute. argument(s) is one or more arguments that are used by function. format is an optional format to apply to the result of function.

Example: Suppose the following code was submitted on Friday, June 7, 2002: Example: title "%sysfunc(today(),weekdate.) - SALES REPORT";

%SYSFUNC executes the DATA step function TODAY() and formats the result using the format WEEKDATE. The title on the next report would be: Friday, June 7, 2002 - SALES REPORT.

Try It Out

Problem Statement

Write a PRINT procedure in a macro program has three parameters DATA naming the data set OBS specifying how many records to print TL specifying the title line for the print.

Assign default values to all the parameters. Also use Macro System options to get the information returned by the macro processor

Code

OPTIONS SYMBOLGEN MLOGIC MPRINT; %macro testprnt ( data = &syslast , obs = 90 , tl = 3) ; proc print data = &data (obs=&obs) ; title&tl “Contents of Dataset &data with &obs observations”; run ; title&tl ; %mend testprnt ; %testprnt(data = all, obs = 100, tl = 5); / * macro reference */

Refer File Name: 23.1.sas to obtain soft copy of the program code

Page 161 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 162: SAS Handout 1.0

Handout - SAS

How It Works

&syslast returns the most recently created dataset. title&tl - clears the title statement

Summary

A macro is a group of SAS statements that is identified by a name Parameters are values that are passed to the macro at the time of invocation Macro variables come in two varieties Local & Global SYMBOLGEN, MPRINT , MLOGIC, %PUT statements are used for debugging macro

code. You can use macros to control conditional execution of statements We can also use the Iterative processing in Macros Macro character functions have the same basic syntax as the corresponding DATA

step functions and they yield similar results.

Test your Understanding

1. How would you invoke a macro? 2. How do you define the end of a macro? 3. What is the difference between %PUT and SYMBOLGEN? 4. What is the difference between %LOCAL and %GLOBAL? 5. How are parameters passed to a macro? 6. How would you code a macro statement to produce information on the SAS log? 7. What %put do?

Page 162 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 163: SAS Handout 1.0

Handout - SAS

Session 25: Help on SAS

Learning Objectives

After completing this session, you will be able to: Debug SAS Programs Create Efficient SAS Codes

Debugging SAS Programs

Error Handling: Errors are classified into

Syntax Error Data Error Logic Error

Syntax Error:

Syntax errors occur when program statements do not conform to the rules of the SAS language. Examples of syntax errors include

misspelling a SAS keyword Uninitialized variable Variable not found using unmatched quotation marks forgetting a semicolon specifying an invalid statement option Specifying an invalid data set option.

Example: In the below program, DATA statement is misspelled, and SAS prints a warning message to the log. Program: date temp; x=1; run;

SAS Log: Syntax Error (misspelled key word) date temp; WARNING 14-169: Assuming the symbol DATA was misspelled as date.

Page 163 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 164: SAS Handout 1.0

Handout - SAS

x=1; run; NOTE: The data set WORK.TEMP has 1 observations and 1 variables.

Because SAS could interpret the misspelled word, the program runs successfully and produces the output. SAS interprets the misspelled keywords only in some cases.

Data errors:

Missing values are generated when Data error occurs. Data error occurs during the following scenarios

Numeric to character conversion Invalid data Character field truncated

Data errors occur when some data values are not appropriate for the SAS statements that you have specified in the program. For example, if you define a variable as numeric and assigns a character value to it, SAS generates a data error. SAS detects data errors during program execution. When a data error is encountered, SAS does the following and continues to execute the program.

Writes an invalid data note to the SAS log Prints the input line and column numbers that contain the invalid value in the SAS log. SAS prints a rule line above the observation Sets the automatic variable _ERROR_ to 1 for the current observation and continue

the execution. Example Program: DATA EMP; INPUT EMPID NAME $ SALARY ; DATALINES; 1000 RAJU 1000 1001 KUMAR $2,561.00 1002 ABISHEK 4586 ; RUN; DATA EMP;

Page 164 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 165: SAS Handout 1.0

Handout - SAS

Log file:

Logic Errors

Wrong result, but no error message

Determining Logic Errors: Use the DEBUG option in the DATA statement to help identify logic problems. The DEBUG option is an interactive interface to the DATA step during DATA step execution. This option is useful to determine

Which piece of code is executing Which piece of code is not executing The current value of a particular variable When the value of a variable changes.

General form of the DEBUG option: DATA data-set-name / DEBUG;

Common commands used with the DEBUG option.

Command Abbreviation Action

STEP ENTER keySteps through a program one statement at a time.

EXAMINE E variable(s) Displays the value of the variable.

WATCH W variable(s)Suspends execution when the value of the variable changes.

LIST WATCH L W List variables that are watched.

QUIT Q Halts execution of the DATA step.

Page 165 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 166: SAS Handout 1.0

Handout - SAS

The DATA step is the most problematic part of SAS debugging. First rule of debugging o Always check the SAS log o Always start at the beginning

For DATA step Debugging you can use, o PUT statements o Automatic variables (_ALL_, _INFILE_) o IN data set option

Don’t limit DATA step debugging strictly to DATA step tools. Also use Procedures to Debug DATA Steps like, o FREQ o MEANS o PRINT o REPORT o CONTENTS o DATASETS

If your program is well documented and aligned neatly, debugging is very easy.

Creating Efficient SAS Codes

What is Efficiency? Minimizing the use of the following resources generally characterizes programming efficiency

CPU time (the time your computer takes to perform calculations) I/O time (the time your computer takes to read data into memory and write data from

the memory to your hard drive) Memory Data storage Programming time

Avoid Unnecessary Data Steps: Example: Inefficient Efficient data new; proc means data=old; set old; where x > 10; where x > 10; var x y z; run; run; proc means data=new; var x y z; run;

Here, a new dataset is created for the sole purpose of performing a procedure on a subset of data. Instead, use a where statement in the procedure to do this. Where statements can be used with all procedures.

Page 166 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 167: SAS Handout 1.0

Handout - SAS

Sub-setting data from one dataset into multiple datasets can be achieved in one data step instead of many.

The datasets procedure can perform many housekeeping operations on a dataset, including copying, deleting, and renaming datasets, renaming variables, adding labels or changing formats. It does these operations much more efficiently than using data step programming because, it modifies only the descriptor portion of the Dataset whereas, the DATA step reads all the data from the dataset. Store Data in SAS Datasets:

Instead of storing data in a raw data file and reading it again and again, store the raw data file in a permanent SAS dataset for later use.

SAS reads a Dataset faster than an external file. Keeping only the required variables:

When inputting a flat file, input only the variables needed. When inputting a SAS dataset, use a KEEP statement to keep only the variables

needed. (Note: DROP will work, but KEEP provides good documentation.) DROP intermediate variables used for calculations.

Example: DATA X; DO I= 1 to 3; DO J=1 to 5; TEMPVAR = I;

Page 167 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 168: SAS Handout 1.0

Handout - SAS

NEWVAL=TEMPVAR * J; END; END: DROP I J TEMPVAR; RUN;

When outputting a dataset, KEEP only the variables needed. Example: DATA X(KEEP=A1 A2 A3 NEWVAR1 NEWVAR2);

Specify minimum length for variables: While creating a dataset, define the smallest possible length for variables. This can be done by using the LENGTH, INFORMAT & ATTRIB statements. This will reduce any unwanted blank spaces in the variable values and thus reduces the disk space usage. Example: Data work.dsn1; Set work.dsn2; length var1 $4. var2 $5. var3 6.2; run;

Use WHERE statement for conditional processing: Use the WHERE statement instead of the sub-setting IF statement to filter data, if the dataset is large. The WHERE statement filters the data before it gets loaded into the PDV whereas, the IF statement filters the data only after the data is loaded into the PDV.

Inefficient Method Efficient Method

Data work.dsn1 ; set work.dsn2 ; if Product = ‘Sofa’; run;

Data work.dsn1 ; set work.dsn2 ; where Product = ‘Sofa’; run;

Use IF-THEN/ELSE instead of multiple IF statements: Use the IF-THEN / ELSE statement instead of a series of IF-THEN statements. IF-THEN / ELSE statement skips the remaining conditions, if a condition is met whereas, the separate IF-THEN statements checks all the conditions for all the observations.

Page 168 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 169: SAS Handout 1.0

Handout - SAS

Inefficient Method Efficient Method

Data work.dsn1 ; set work.dsn2 ; if Product = ‘Sofa’ then Discount=0.08; if Product = ‘Bed’ then Discount=0.10; if Product = ‘Chair’ then Discount=0.12;Run;

Data work.dsn1 ; set work.dsn2 ; if Product = ‘Sofa’ then Discount=0.08; else if Product = ‘Bed’ then Discount=0.10; else if Product = ‘Chair’ then Discount=0.12;run;

IF/THEN/ELSE

When using a series of IF ... THEN ... ELSE ... statements, list the conditions in descending order of probability. This will save CPU time., Example: IF YEAR LT THISYR THEN OUTPUT OUTOLD; ELSE IF YEAR EQ THISYR THEN OUTPUT OUTCUR; ELSE OUTPUT OUTBAD;

SORT

Sort only the variables needed. It is faster. Example: PROC SORT DATA=X (KEEP=A B C);

When sorting a permanent dataset or a large file, sort it into another dataset. Sorting into a permanent dataset takes more I/O. Sorting a large file requires more space. Example: PROC SORT DATA=PERMLIB.X (KEEP= A B) OUT=XSORT;

Use the right operator to select records: Use the IN operator rather than OR operator to select a list of values.

Inefficient Method Efficient Method

Data work.dsn1 ; set work.dsn2 ; if Product in (‘Sofa’, ‘Bed’, ’Chair’) then Type=’Furniture’; Run;

Data work.dsn1 ; set work.dsn2 ; if Product =(‘Sofa’ or ‘Bed’ or ’Chair’) then Type=’Furniture’; run;

Page 169 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 170: SAS Handout 1.0

Handout - SAS

Place the selection criteria in the right position: Imply the selection criteria first on the columns to delete unwanted observations before reading or processing rest of the fields.

Inefficient Method Efficient Method

Data work.dsn1 ; Set work.dsn2 ; Discount= ( Price * 0.04) ; Profit = ( Price * 0.10 ); if Product = ‘ Computer ‘ ; run;

Data work.dsn1 ; Set work.dsn2 ; if Product = ‘ Computer ‘ ; Discount= ( Price * 0.04) ; Profit = ( Price * 0.10 ); run;

Use a subset of data for testing codes: For testing a piece of SAS code on a large dataset use a part of the dataset using OBS= or OUTOBS= options rather than using the whole dataset.

Date work.dsn1; set work.dsn2 ( obs=1000); A=mean(salary); run;

Proc sql outobs=1000; create table work.dsn1 as select mean( salary) as A from work.ds2; Quit;

Compressing large Datasets: Use the COMPRESS= option while creating large datasets to store the datasets in compressed format. Use OPTIONS COMPRESS=YES; statement at the beginning of any SAS codes. Index the variables used for conditional processing: Create index on key columns or columns used for conditional processing i.e., columns used by WHERE or IF statements. Searching is faster if the column is indexed Index the variables used for conditional processing: Create index on key column, columns which are used for conditional processing i.e., columns used by WHERE or IF statements. Delete Unneeded Datasets: At the end of the program or at strategic points, it is a good practice to use PROC DATASETS to delete unneeded data sets from the work or permanent library. This will make room for the new datasets. This not only will improve performance, but more importantly will show the intention to the reader as well. PROC DATASETS LIBRARY = WORK; DELETE TEMP EMP; QUIT;

Page 170 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 171: SAS Handout 1.0

Handout - SAS

Consolidate program steps using Proc SQL: Consolidate programming steps using the SQL procedure in order to save process time and resources.

Inefficient Efficient

Data work.dsn1; Set work.dsn2; Run; Proc sort data=work.dsn1; By products; Run;

Proc sql; Create table work.dsn2 as Select * from work.dsn1 Order by products; Quit;

Summary

Errors are classified into: o Syntax Error o Data Error o Logic Error

First rule of debugging: o Always check the SAS log o Always start at the beginning

Minimizing the use of the following resources generally characterizes programming efficiency: o CPU time o I/O time o Memory o Data storage o Programming time

Test your Understanding

1. How do you debug and test your SAS programs? 2. What can you learn from the SAS log when debugging? 3. What system options would you use to help debug a macro?

Page 171 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 172: SAS Handout 1.0

Handout - SAS

References

Websites

www.sas.com www.support.sas.com http://v8doc.sas.com/sashtml/ http://support.sas.com/onlinedoc/913/docMainpage.jsp SUGI Papers http://www2.sas.com/proceedings/sugi30/toc.html http://www.SierraInformation.com http://www.cpc.unc.edu/services/computer/presentations/sasclass99 http://www.sasforum.co.nr/ http://www.ats.ucla.edu/stat/sas/ http://www.datasavantconsulting.com/roland/sastips.html http://en.wikipedia.org/wiki/SAS_System#Early_history_of_SAS http://www.nber.org/~veronica/sastips.htm http://www.ats.ucla.edu/STAT/sas/library/nesug00/bt3005.pdf

Books

SAS® Programming by Example - By Ron Cody & Ray Pass The Little SAS® Book: A Primer, Third Edition - By Lora D. Delwiche & Susan J.

Slaughter SAS® Certification Prep Guide: Base Programming for SAS®9 - By SAS Publishing SAS® Certification: Advanced Programming - By SAS Publishing SAS® Macro Programming Made Easy, Second Edition - By Michele M. Burlew PROC SQL: Beyond the Basics Using SAS® - By Kirk Paul Lafler SAS® For Dummies® by Stephen McDaniel & Chris Hemedinger Learning SAS® by Example: A Programmer's Guide by Ron Cody

Page 172 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

Page 173: SAS Handout 1.0

Handout - SAS

Page 173 ©Copyright 2007, Cognizant Technology Solutions, All Rights Reserved

C3: Protected

STUDENT NOTES: