Top Banner
Posters MANIPULATE AND CREATE WITH PROC TABULATE Kimberly A. Mitchell and Joseph A. Kufera National Study Center for Trauma and EMS, University of Maryland at Baltimore PROC TABULATE is a powerful tool for sununarizing categorical and numeric data in tabular formats. It allows computation of many of the same descriptive statistics as PROC MEANS, PROC FREQ, and PROC SUMMARY. However, PROC TABULATE allows much more versatility in displaying data, classifying the values ofvariables and establishing hierarchical relationships between variables. The procedure has features and options that give the user almost total control over the appearance of tables. Many beginning SAS users do not use TABULATE not only because they are unaware of its capabilities but also because it appears to be difficult to use. Admittedly, in user manuals, TABULATE does appear to be complicated. But many of the features of MEANS, FREQ, and SUMMARY are available in TABULATE, so why use three PROCS when the task may be accomplished with one? The application ofPROC TABULATE involves two steps: the first is placing the data in the proper rows and columns of the table and the second is refining the appearance of the data in the table. The key to successful use of TABULATE is to start with small tables and expand them as needed. The authors assume that the reader has little to no experience using TABULATE and will provide a step-by-step approach to using the procedure. SPECIFICATIONS The PROC TABULATE statement invokes the procedure. The following statements are required: (1) a CLASS statement andlor (2) a VAR statement and (3) one or more TABLE statements. Additional option statements include BY, FORMAT, KEY LABEL, and WEIGHT. Basic Syntax tor PROC TABULATE PROC TABULATE DATA=SAS-data-set <additional options>; RUN; CLASS class-variables; V AR analysis-variables; TABLE table-specification <I options>; Additional option statements; Brief descriptions of table options, statements and statement options used in PROC TABULATE are given below. For a more detailed description of the options and NESUG '96 Proceedings 566 statements, refer to the SAS Guide to TABULATE Processing. Statements and options featured in the examples are indicated by an asterisks. PROC TABULATE statement options ·DATA= specifies the.SAS data set name to be used by TABULATE. ·FORMA T= indicates the format to be used to display data for each cell of the table. The default format is Best 12.2. This format is overwritten by formats supplied in the table statement. The format statement is used to change the column width. If the column is not wide enough, any label associated with the variable will wrap to the second line. ·FORMCHAR= defines the shell of the table and the separator lines. The default is '1-1+1-'. You may use any character or hexidecimal string to change the table's appearance. Eleven blank spaces between the quotes will make the table appear as though there are no lines, when, in fact, the gridlines have been replaced by blank spaces. MISSING indicates that missing values in the class variables are to be displayed in the table. ·NOSEPS removes the lines from the body of the table and row title space. It also deletes the blank lines between the rows making the data appear single spaced. ORDER= specifies the order in which the responses from the row variable will be displayed in the table. ORDER=intemal (default) displays the heading for the class variables in the same order as if ordered by the sort procedure. ORDER=jormatted displays the headings for the class variables according to the external format. Formats with pecial characters, such as a less than sign «), will be displayed at the bottom of the list. ORDER=freq displays the headings for the class variable in descending order by count. V ARDEF= specifies the divisor to be used to calculate variances. DEPTH= specifies the maximum depth of any dimensions crossing. The default depth is 10.
14

Posters - Lex Jansen · 2012. 9. 7. · V ARDEF= specifies the divisor to be used to calculate variances. DEPTH= specifies the maximum depth of any dimensions crossing. The default

Mar 01, 2021

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Posters - Lex Jansen · 2012. 9. 7. · V ARDEF= specifies the divisor to be used to calculate variances. DEPTH= specifies the maximum depth of any dimensions crossing. The default

Posters

MANIPULATE AND CREATE WITH PROC TABULATE Kimberly A. Mitchell and Joseph A. Kufera

National Study Center for Trauma and EMS, University of Maryland at Baltimore

PROC TABULATE is a powerful tool for sununarizing categorical and numeric data in tabular formats. It allows computation of many of the same descriptive statistics as PROC MEANS, PROC FREQ, and PROC SUMMARY. However, PROC TABULATE allows much more versatility in displaying data, classifying the values ofvariables and establishing hierarchical relationships between variables. The procedure has features and options that give the user almost total control over the appearance of tables.

Many beginning SAS users do not use TABULATE not only because they are unaware of its capabilities but also because it appears to be difficult to use. Admittedly, in user manuals, TABULATE does appear to be complicated. But many of the features of MEANS, FREQ, and SUMMARY are available in TABULATE, so why use three PROCS when the task may be accomplished with one?

The application ofPROC TABULATE involves two steps: the first is placing the data in the proper rows and columns of the table and the second is refining the appearance of the data in the table. The key to successful use of TABULATE is to start with small tables and expand them as needed. The authors assume that the reader has little to no experience using TABULATE and will provide a step-by-step approach to using the procedure.

SPECIFICATIONS

The PROC TABULATE statement invokes the procedure. The following statements are required: (1) a CLASS statement andlor (2) a VAR statement and (3) one or more TABLE statements. Additional option statements include BY, FORMAT, KEY LABEL, and WEIGHT.

Basic Syntax tor PROC TABULATE

PROC TABULATE DATA=SAS-data-set <additional options>;

RUN;

CLASS class-variables; V AR analysis-variables; TABLE table-specification <I options>; Additional option statements;

Brief descriptions of table options, statements and statement options used in PROC TABULATE are given below. For a more detailed description of the options and

NESUG '96 Proceedings 566

statements, refer to the SAS Guide to TABULATE Processing. Statements and options featured in the examples are indicated by an asterisks.

PROC TABULATE statement options

·DATA= specifies the.SAS data set name to be used by TABULATE.

·FORMA T= indicates the format to be used to display data for each cell of the table. The default format is Best 12.2. This format is overwritten by formats supplied in the table statement. The format statement is used to change the column width. If the column is not wide enough, any label associated with the variable will wrap to the second line.

·FORMCHAR= defines the shell of the table and the separator lines. The default is '1-1+1-'. You may use any character or hexidecimal string to change the table's appearance. Eleven blank spaces between the quotes will make the table appear as though there are no ~grid" lines, when, in fact, the gridlines have been replaced by blank spaces.

MISSING indicates that missing values in the class variables are to be displayed in the table.

·NOSEPS removes the lines from the body of the table and row title space. It also deletes the blank lines between the rows making the data appear single spaced.

ORDER= specifies the order in which the responses from the row variable will be displayed in the table.

ORDER=intemal (default) displays the heading for the class variables in the same order as if ordered by the sort procedure. ORDER=jormatted displays the headings for the class variables according to the external format. Formats with pecial characters, such as a less than sign «), will be displayed at the bottom of the list.

ORDER=freq displays the headings for the class variable in descending order by count.

V ARDEF= specifies the divisor to be used to calculate variances.

DEPTH= specifies the maximum depth of any dimensions crossing. The default depth is 10.

Page 2: Posters - Lex Jansen · 2012. 9. 7. · V ARDEF= specifies the divisor to be used to calculate variances. DEPTH= specifies the maximum depth of any dimensions crossing. The default

"TABLE statement: TABLE «page-expression,> row expression, > column-expression <ltable-option-list>;

Every TABULATE procedure requires at least one TABLE statement. Each TABLE statement defines one table, but many TABLE statements may appear in the procedure. The TABLE statement defines the appearance of the table. All variables used in the TABLE statement must be declared before the TABLE statement in the CLASS or V AR statement but not both. The TABLE statement contains two important parts: (I) the table dimensions and (2) the variable groupings.

Table dimensions are used to describe the table. One to three dimension expressions can be used in the TABLE statement, each separated by a comma. Whenever a comma is used a new dimension is being described. Table I demonstrates examples of the three table dimensions. The page dimension and BY group processing are similiar except that the data do not have to be sorted when the page dimension is defined. Using the page dimension is the preferred method of grouping data.

Table I Dimension Statements Table Dimension Statements

Variable Specified Dimension

c column dimension b,c row and column dimension a,b,c page, row, and column dimension

Variables may be combined three ways using TABULATE. The first method involves crossing variables. In this method the variables are nested within each other and separated by asterisks. The second method involves concatenation. The variables are listed in a series, independent of each other, and separated by blanks. The third method involves grouping variables inside parentheses. This method is used in association with an operator, which is applied to all variables within the parentheses.

PROC TABULATE has a universal variable called ALL. The ALL variable is used to create summary information for groups of categories as well as detailed information on each category. If the ALL variable is listed first, it will be printed before the variable for which it is associated.

TABLE statement options

BOX= specifies the text to be placed in the empty box in the upper left corner of the table.

Posters

BOX= "page _, the page dimension text appears in the box. BOX='sfring', the quoted expression appears in the box. BOX=variable, the variable name or label appears in the box.

CONDENSE allows the printing of multiple tables on a single page.

MISSTEXT='slring', supplies the description to be printed in cells that 'contain missing values (maximum of 20 characters).

PRINTMISS indicates that all levels of the classification variable(s) be presented even where data are missing.

ROW= spacing, indicates whether elements in a row crossing are allotted space even when they are blank.

The default is CONSTANT, which indicates space will be allotted for blank row crossings. Use FLOAT to divide space evenly among the nonblank titles.

"RTS= number, supplies an integer that specifies the number of characters printed for the title in the row dimension. The default is (linesize/4) -2.

FUZZ=number, specifies that a cell percent or an analysis variable whose absolute value is less than the number specified will be treated as zero.

"CLASS statement: CLASS Variables; The CLASS statement identifies variables in the input data set used to classify or group the data. The variables may be numeric or character and should contain a limited number of levels. For numeric variables with continous values, formats may be used to classify the data. Remember that class variables affect observations regardless of whether the variable is listed in the TABLE statement, because the CLASS statement is in effect for the entire TABULATE procedure.

"VAR statement: VAR variable; The V AR statement identifies the analysis variables used to compute statistics. Variables in the V AR statement must be numeric. If a variable contains missing data, the value is not included in the computation of statistics (e.g., the value is not counted when calculating the MEAN), except N and NMISS. N is the number of observations with nonmissing values and NMISS is the number of observations with missing values.

567 NESUG '96 Proceedings

Page 3: Posters - Lex Jansen · 2012. 9. 7. · V ARDEF= specifies the divisor to be used to calculate variances. DEPTH= specifies the maximum depth of any dimensions crossing. The default

Posters

BY statement: BY variable; The BY statement is used to obtain separate analyses on observations in groups indentified by the BY statement. The BY statement works similiar to the PAGE dimension, except that the data need to be sorted when using the BY statement. The preferred method to obtain separate analyses is the page dimension.

"FOBMA T statement: FORMAT variable-I format-I. ; The FORMAT statement formats class variables as headings in the page, row and column dimensions. The format statement has no effect on analysis variables or the content of table cells.

FREO statement: FREQ variable; The FREQ statement indicates a numeric variable whose value represents the frequency of the observation. Only one variable can be used in the FREQ statement. The FREQ statement may be used in conjunction with the weight statement.

"KEY LABEL statement: KEYLABEL keyword-I =

'label-I'; The KEYLABEL statement is used to label statistics (e.g., mean) used in the TABLE statement as well as the universal ALL variable unless another label is used in the TABLE statement.

"LABEL statement: LABEL variable-l='label-l'; The LABEL statement is used to specuy a label to replace the variable name in the page, row, or column dimension. The label may have as many as 40 characters, including blank spaces, and must be enclosed in single or double quotes. The LABEL statement is in effect for all tables listed unless another label is applied in the TABLE statement.

WEIGHT statement: WEIGHT variable; The WEIGHT statement is used to specuy a numeric variable whose value is used to weight each analysis variable. When the WEIGHT statement is specified, the table displays weighted statistics.

ADDmONAL INFORMATION

OPERATORS Table 2 lists the operators that may also be used in the dimension expressions.

STATISTICS For each table specified with a TABLE statement.

the number of nonmissing observations (N) for class variables and the sum of nonmissing values for analysis variables are printed by default. Additional statistics such as percent (pCTN), mean (MEAN), and range (RANGE)

NESUG '96 Proceedings 568

may be specified by using the keyword found in parentheses. One of three percentage types may be requested in each table: (I) the overall percent, (2) the column percent, or (3) the row percent. To calculate the column or row percent, the user defines the denominator definition. The column percent is calculated by summing over the rows; therefore, the row variable is placed in the brackets. The column variable is placed in brackets when calculating row percents. To calculate statistics, the keyword must be crossed with the variable or group of variables. All statistical manipulations must be performed within the same dimension. For a comprehensive list of statistics available with PROC TABULATE, see the SAS Guide to TABULATE Processing.

Table 2 Operators Operator Function asterisk * crosses variables, formats, and

mathematical operators within a dimension

comma , separates dimensions of a table and creates crossing of variables across dimensions

blank space concatenates variables within a dimension

parentheses ( ) groups variables and/or mathematical operators within a dimension

brackets<> specifies denominator definitions when calculating percentages

equal sign = assigns a label to a variable or statistic, or completes a format modifier

CODING USING PROC TABULATE

The examples described below demonstrate a step­by-step approach to using PROC TABULATE. The result will be two tables; one ready for presentation and the second ready for submission to a publisher. Although the guidelines suggest adding one feature/option at a time, on several occasions the authors have added two or more features/options because of space limitations. The spacing of several tables have been modified to fit the space allowed.

The data used in the examples are described as follows: data set: info variables: age - age on admission to hospital

sex - sex (1=male,2=female) disp - disposition on discharge

(L=lived, D=died) formats: agef - categorizes patient age

Page 4: Posters - Lex Jansen · 2012. 9. 7. · V ARDEF= specifies the divisor to be used to calculate variances. DEPTH= specifies the maximum depth of any dimensions crossing. The default

sexf - labels male and female $Idf - labels lived and died

EXAMPLE I will be a display of patient gender and age by disposition. The table will contain percentage of gender by disposition category and mean age and range for each disposition category. This will involve concatenating variables to display percentages and means in the same table.

1. Decide which variables are class variables and which ones are analysis variables.

class variables - disp and sex; analysis variable(s) - age

2. Identify which variables will be used in the page, row, and column dimensions.

page dimenson - n~ne; row dimension - sex and age; column dimension - disp

3. Design how the table should look on paper (Table 3). Include as many details as possible (headings, counts, percentages, etc.):

Tabl 3 La fo E I 1 e ayout or xample DIED LIVED

MALE n 10 90 % 10.0 90.0

FEMALE n 20 80 % 20.0 80.0

MEAN AGE 55 37 RANGE 17-80 13-60

4. Write SAS code using the class, var, and table statements without options.

/* Default Options */

proc tabulate data=info; class sex disp; *classification

var. from step 1; var age; *analysis var.

table sex age ,

from step 1 (sum is printed by default) ;

disp ; *sex &: age are

run;

concatenated in the row dimension and disp is in the column dimension;

The results are displayed in Table 4.l.

Posters

5. Make a list of changes and enhancements to the table.

change format to no decimals; change sum to mean and add the range; calculate percentage for males and females; add labels to variables; addfonnats to variables; add labels mean and range; remove lines from middle of table

Table 4.1 Default Format for Example I

I I

1 DISP 1

1------------------1 I I D I L I 1---------+--------+---------1 IsEX I I I I 1----+----1 I I /1 IN 1 194.001 2909.001 1----+----+ -------+---------1 12 IN 1 51.001 noo.ool 1----+----+--------+---------1 IAGE IsUM /10182.001138505.001

6. Add features/options, one at a time, that will make the necessary changes to the table. Code new features/options in lowercase, then change to uppercase

. after they have executed successfully. The results are displayed in Tables 4.2, 4.3, 4.4. and 4.5.

/*FORMAT CHANGES &: ADDING STATISTICS*/

PROC TABULATE DATA=INFO f=7.0 *changing format of output ;

CLASS SEX DISP; VAR AGE; TABLE SEX AGE * (mean min max) ,

DISP; *changing sum to mean &: adding min. and max for age;

RUN; The output is in Table 4.2.

/*CALCULATING ROW PERCENTAGES*/

PROC TABULATE DATA=INFO F=7.0; CLASS SEX DISP; VAR AGE;

. TABLE SEX * (n pctn<disp> * f=5.1) AGE * (MEAN MIN MAX) , DISP ;

RUN;

*calculating row percents with a format of 5.1;

The output is in Table 4.3

569 NESUG '96 Proceedings

Page 5: Posters - Lex Jansen · 2012. 9. 7. · V ARDEF= specifies the divisor to be used to calculate variances. DEPTH= specifies the maximum depth of any dimensions crossing. The default

Posters

Table 4.2 Fonnat and Statistics

I I

I DISP 1----------

I ID I L 1----------+----+-----ISEX I I I 1----+-----1 I 11 IN I 1941 2909 1----+-----+----+-----12 IN I 511 1100 1----+-----+----+-----IAGE IMEAN I 421 35 I 1-----+----+-----I IMIN I 31 1 I 1-----+----+-----I IMAX I 921 95

Table 4.3 Row Percentages

I DISP 1---------------I D I L

------------------+-------+-------SEX I I --------+---------1 1 IN I 19412909

1---------+-------+-------IpCTN I 6.31 93.7

--------+---------+-------+-------2 IN I 511 1100

1---------+-------+-------IpCTN I 4.41 95.6

--------+---------+-------+-------I AGE IMEAN I 421 35 I 1---------+-------+-------I I MIN I 31 11 I 1---------+-------+-------1 I IMAX I 921 951

/*LABELS, FORMATS, & ROW TITLE SPACE*/

PROC TABULATE DATA=INFO F=7.0; CLASS SEX DISP; VAR AGE; TABLE SEX='Pt. Gender' * (N PCTN

<DISP> * F=5_1) AGE ='Pt. Age' * (MEAN MIN MAX) , DISP ='Pt. Disposition' I rts=25; *adding labels

to variable

NESUG '96 Proceedings 570

names and changing RTS to fit labels};

fo~at sex sexf. disp $ldf.; *adding user­defined formats;

RUN; The output is displayed in Table 4.4

Table 4.4 Labels, Fonnats and Row Title Space

I I

Ipt. Disposition 1---------------

I I Died I Lived 1------------------+-------+-------Ipt. Gender I I I 1-----------+------\ I IMale IN I 1941 2909 I 1------+-------+-------I IpCTN I 6.31 93.7 1-----------+------+-------+-------I Female IN I 511 1100 I 1------+-------+-------I IpCTN I 4.41 95.6 1-----------+------+-------+-------Ipt. Age IMEAN I 421 35 I 1------+-------+-------I I MIN I 31 11 I 1------+-------+-------1 I IMAX I 921 951

/*REMOVING LINES & ADDING LABELS TO STATISTICS*/

PROC TABULATE DATA=INFO F=7.0 noseps

CLASS SEX DISP; VAR AGE;

* removing lines and blank spaces in the table;

TABLE SEX='Pt. Gender'* (N PCTN <DISP> * F=5.1) AGE ='Pt. Age' * (MEAN MIN MAX) , DISP ='Pt. Disposition' I RTS=25;

FORMAT SEX SEXF. DISP $LDF.; keylabel

RUN;

n='n' pctn=' %"' mean=' Mean' min= 'Minimum' max= 'Maximum' ; *adding labels

to requested statistics;

Page 6: Posters - Lex Jansen · 2012. 9. 7. · V ARDEF= specifies the divisor to be used to calculate variances. DEPTH= specifies the maximum depth of any dimensions crossing. The default

The results are in Table 4.5.

Table 4.5 Final Table for Example 1 Removing Lines and Adding Labels to Statistics

Ipt. Disposition I �---------------�

I I Died I Lived I 1--------------------+-------+-------1 Ipt. Gender I I I IMale n I 1941 29091 I % I 6.31 93.71 1 Female n I 511 11001 1 % I 4.41 95.61 Ipt. Age Mean I 421 351 I Minimum I 31 11 I Maximum I 921 95 I

Example 1 is now complete. Two things to remember when using TABULATE: 1) When a new feature/options works, create a file of the SAS code and output for future use and 2) use existing TABULATE code whenever possible to create new tables.

EXAMPLE 2 will be a display of patient age groups by disposition and gender. The table will contain percentage of disposition by gender categol}' as well as a grand total of patients in each age categol}'. This will involve crossing variables to display a hierarchical relationship between disposition and sex.

Features and options used in Example I are not presented in lowercase when used for the first time in Example 2. Due to space limitations, only partial data are displayed in Tables 5.1 and 5.2.

1. Decide which variables are class variables and which ones are analysis variables.

class variables - disp sex age; analYSis variable(s) -none

2. Identify which variables will be used in the page, row, and column dimensions.

page dimenson - none; row dimension - age; column dimension - disp sex

3. Design how the table should look on paper (Table 4). Include as many details as possible (headings, counts, percentages, etc.):

Table 4 Layout for Example 2 ' Patient Disposition

Died Lived Male Female Male Female n % n % n % n %

Age

Posters

Total

< 18 6 1.1 3 5.2 7 2.4 4 3.0 20

>= 65 1 0.2 6 10.4 2 1.0 1 0.7 10

4. Write SAS code using the class, var, and table statements without options.

/*DEFAULT OPTIONS*/

proc tabulate data=info ; class sex disp age; *classification

var. from step 1; table age ,

disp*sex *age in the

run;

row dimension and disp & sex crossed in the column dimension;

The output is in Table 5.1.

Table 5.1 Default Options

I DISP I 1----·---_·_-------------·--1 I D I L I 1-_·_·-_····_-+----------·--1 I SEX I SEX I 1----·--_·_---+-------------1 I 1 I 2 I 1 I 2 I 1-_·_·-+--_·_-+------+------1

I I N I N I N I N I I-·-·+·-----+·-----+---·~-+------I IAGE I I I I I 1- - - - I I I I I 11 I . I . I . I 2.00 I 1----+---_·_+------+-_··_-+---·--1 13 I 1.001 .1 .1 1.001 1----+------+------+--_·_-+------1 14 I . I -1 1. 00 I 2.00 I 1----+------+------+------+------1

I*COLUMN PERCENTAGES, LABELS, & ~ALL', */

PROC TABULATE DATA=INFO F=6. 0 ; CLASS SEX DISP AGE; TABLE AGE=~Age Groups',

571 NESUG '96 Proceedings

Page 7: Posters - Lex Jansen · 2012. 9. 7. · V ARDEF= specifies the divisor to be used to calculate variances. DEPTH= specifies the maximum depth of any dimensions crossing. The default

Posters

DISP *SEX * (n pctn <age > *

CLASS SEX DISP AGE;

*removing grid lines; f=5.1) all; * calculating

column percents and adding the ALL variable;

TABLE AGE='Pt. Age Groups',

FORMAT AGE AGEF. SEX SEXF. DISP LDF.;

KEYLABEL N='n ' I?CTN=''Ir' ALL= , Total , ;

label disp='Pt. Disposition' sex ='Gender';

*using label statement to add labels to variables;

RUN; The output is in Table 5.2.

/*'REMOVING' GRID LINES*/ I?ROC TABULATE DATA=INFO F=6.0 NOSEPS

formchar=' , . ,

nISP ·SEX • (N PCTN <AGE > * F=5.1) all*F=5.0 / RTS=15

FORMAT AGE AGEF. SEX SEXF. DISP $LDF.;

KEYLABEL N='n' PCTN='%"' ALL= ' Total' ;

LABEL DISP='l?atient Disposition' SEX =' Gender' ;

RUN; . The output is in Table 5.3

Example 2 is now finished. With a few minor modifications, Table 5.3 is ready to submit to a publisher.

Table 5.2 Column Percentages, Label Statement and 'ALL' Variable

I I Pt. Disposition I I 1-------------------------------1 I I Died I Lived I I 1---------------+---------------1 I I Gender I Gender I I 1---------------+---------------1 I I Male IFemale I Male IFemale ITotal I 1-------+-------+-------+-------+------I I %" I %" I 'Ir I 'Ir IN 1----------------------+-------+-------+-------+-------+------I Age Groups I I I I I 1----------------------1 I I I I 1<18 I 5.71 9.81 9.51 10.91 412 1----------------------+-------+-------+-------+-------+------118-34 I 40.21 27.51 50.71 43.61 2046 1----------------------+-------+-------+-------+-------+------135-64 1 39.21 33.31 34.sl 32.511463 I----------------------+-------+-------+------~+-------+------1>=65 I 14.91 29.4\ 5.01 13.01 333

NESUG '96 Proceedings 572

Page 8: Posters - Lex Jansen · 2012. 9. 7. · V ARDEF= specifies the divisor to be used to calculate variances. DEPTH= specifies the maximum depth of any dimensions crossing. The default

Table 5.3 Final Table for Example 2 Removing 'Grid Lines'

Posters

Patient Disposition

Died

Gender

Male

n %-

Age Groups <18 11 5.7 18-34 78 40.2 35-64 76 39.2 >=65 29 14 .9

CONCLUSION

PROC TABULATE is a powerful tool for generating reports for presentations or submission to a publisher. Although it may appear complicated at first, PROC TABULATE becomes easier to use with tiine and practice. Follow these simple guidelines for successful programming with PROC TABULATE:

(1) Start with small tables and expand them as needed. (2) Place the data in the proper rows and columns first before adding features or options. (3) Change the appearance of the table by adding one feature/option at a time. (4) Code new features and options in lowercase. (5) Save SAS code and output for future use. (6) Use existing TABULATE code whenever possible to create new tables.

REFERENCES

McLeod, K. Tables for TWO: An Introduction to the TABULATE Procedure. In SAS Institute Inc., Proceedings of the Nineteenth Annual SA~ Users Group International Conference, Cary, NC: SAS Institute Inc., 1994. 1695 pp.

SAS Institute Inc., SA~Guide to TABULATE Processing, Second Edition, Cary, NC: SAS Institute Inc., 1990. 208 pp.

n

Lived

Gender

Female Male Female Total

5 14 17 15

%- n %- n %- n

9.8 276 9.5 120 10.9 412 27.5 1474 50.7 480 43.6 2046 33.3 1013 34.8 357 32.5 1463 29.4 146 5.0 143 13.0 333

Toppe, C. An Introduction to PROC TABULATE: A Hands-On Workshop In SAS Institute Inc., Proceedings of the Nineteenth Annual SA~ Users Group International Conference, Cary, NC: SAS Institute Inc., 1994. 1695 pp.

Treff, A V. PROC TABULATE: Controlling Table Appearance. In SAS Institute Inc., Proceedings of the Eighth Annual NorthEast SA~ Users Group Conference, Cary, NC: SAS Institute Inc., 1995.890 pp.

SAS is a registered trademark of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

ACKNOWLEDGEMENTS:

The authors wish to thank Linda Kesselring, Nancy Timmerman, Sandy Teitelbaum and Tim Kerns for their assistance with this manuscript

Author Contact: National Study Center for Trauma and EMS 701 W. Pratt Street - 001 Baltimore, MD 21201-1023

email: [email protected] [email protected]

573 NESUG '96 Proceedings

Page 9: Posters - Lex Jansen · 2012. 9. 7. · V ARDEF= specifies the divisor to be used to calculate variances. DEPTH= specifies the maximum depth of any dimensions crossing. The default

Posters

EVALUATING RIDERSHIP ON THE MUNICIPAL TRANSIT SYSTEM Montelpare, W., Frost, J., Scott, D., Burgess, S., Health Studies Program, Brock University,

St. Catharines, Ontario, Canada

Introduction This project was an assessment of the ridership on the local municipal transit system. The assessment was

divided into three areas: i) data collection, ii) analyses, and iii) reporting. The purpose of the ridership assessment was to provide a total count of "ons and offs" at each bus stop, for each route (inbound and outbound) and within predesignated times of the day for the entire day and for one night on a weekday and and on a weekend day, where appropriate to determine bus stop and loading activity for planning purposes and to determine maximum load points, revenues, boarding passenger totals, and passengers by fare category.

The more versatile, but less often used features of SAS including the input procedures (e.g. retain, dynamic variable summing, and looping strategies), Proc Print (variable sums and averages), and Proc Report provided tabular results for each of the measures of interest In addition to system totals, results were subdivided into time periods by day and across routes. The results of the assessment provided valuable financial information to the system administrators to plan route changes, increase number of buses on selected routes during peak ridership, determine the location for bus shelters, and plan changes !or the use of passes, tickets and transfers.

Method of Data CoDection A strategy was developed to collect the following measures: number of riders, fare type, rider

classification, and schedule adherence. The data were collected by university students trained as observers (N=112). The data collection phase was organized so that a team captain approach was used. The team captains recruited and trained observers to cover the 18 regular scheduled day· routes and selected night routes, on each measurement day. The data were collected over a two day period in April (Wednesday and Saturday).

Data were initially collected on standardized recording sheets. The data consisted of: • The number of all boarding passengers by fare category (cash, ticket, pass, transfer). • The type of passengers boarding (adult, senior. student, child). • The number of departing passengers location of departure. • The passenger load on the bus between stops (and where available street names). • The arrival times at selected timing points throughout the route.

The data were counts or tallies for each of the variables, except the variable "time" which was measured in minutes from the start of the route to the timing point. Observers were required to record the bus route number and the direction of the bus relative to the termina1 (inbound or outbound). The data were transcribed via a text editor to a UN1X based \;perating system.

Rcport PJan Using the Statistical Analysis System (SAS). the data from the ridership survey was processed to produce

the following performance reports for each route: I. F"mancial information report (based on total hours). 2. Financial information report (based on revenue kilometers). 3. Productivity report. 4. Revenue report. S. Ridership report by stop. 6. Schedule adherence.

Although each performance report was different, all of the data were processed by the same generic SAS program. The following is an example of the main program required to read the data set

options linesize=80 pagesize=90 nodate nocenter; proc format; value dirfmt l=inbound 2=outbound;

data bus2; drop check m;

output and page set-up Proc Fonnat options

Initializes the working file. Drop the variables "check" and "m" after they have been used to denote data input position.

NESUG '96 Proceedings 574

Page 10: Posters - Lex Jansen · 2012. 9. 7. · V ARDEF= specifies the divisor to be used to calculate variances. DEPTH= specifies the maximum depth of any dimensions crossing. The default

retain route direction bus num day depart tplabell tyml tplabe12 tym2;

infile cards column=column;

length check $ 1;

m=column;

input check $ @@;

if m>column then m=l;

if check='g' then input @m check $ route direct busnum day $

depart timeS. tplabel1 $ tyml timeS. tplabe12 $ tym2 timeS. ;

else input @m stop numb trans $ offs @@;

if stop >.;

** transforming in/ol71llltion about the type o/transadion into text length trans kind $ 18;

if trans 7't' then transkind='transfer'i if trans =4-' cl' then transkind=' adul t cash'; if trans ='t1' then transkind='adult ticket'; if trans ='pl' then transkind='adult pass';

*. grouping tTtlnsadiom length trangrp $ 10;

if trans = 'pl' then trangrp= 'pass' ; if trans ='cl' then trangrp='cash'; if trans ='tl' then trangrp='ticket';

•• grouping/ares money=O; if trans ='cl' then money = 1.50; if trans ='t4' then money = 0.95; if trans ='p4' then money = 0.35;

** computing expendilures per ,top revenue=money*number of passengers;

575

Keep this Ust of variables for processing later.

Posters

The data will be on cards, a new variable called "column" will hold the value of the pointer.

Initializes a text variable. The variable is "check", and holds a single alpha numeric character.

Assigns the value variable "column" to the variable ''m'' and continues.

Moves the pointer to position #1 and input the value of position #1.

Evaluates the value of urn".

Reads the input lines if the variable "check" equals "g".

If the value of the variable "check" '" "g" read next input line.

Data input continues while the value of the variable "stop" is greater than missing or empty.

The variable ''translcind'' is less than 18 characters & alpha numeric.

Setting the label for the variable ''transkind'' based on the variable "trans"

The variable "transgroup" is less than 10 characters & alpha numeric

Setting the label for the variable "transgrp"

Initiali<:e variable ''money''. For each variable which is a "transkind" read the corresponding "money" value.

NESUG '96 Proceedings

Page 11: Posters - Lex Jansen · 2012. 9. 7. · V ARDEF= specifies the divisor to be used to calculate variances. DEPTH= specifies the maximum depth of any dimensions crossing. The default

Posters

.. computing the toIIll hoUTS per route based on Sample tkrtIJ totalhrs=O;

if route=OOl and direct=2 and depart<'07:00't then totalhrs = 0.77;

** creating time points length timpnt $ 20;

if depart< '07:00't then timeP='l. before 7am'; if depart>='07:00't and depart <'09:30't

then timeP='2. 7am to 9:30am'; if depart>='09:30't and depart <'14:30't

then timeP='3. 9:30am to 2:30pm'; if depart>='14:30't and depart <'18:30't

then timeP='4. 2:30pm to 6:30pm'; if depart>='18:30't then timeP='S. after 6:30pm';

numTime=O; if timeP='l. before 7am' then numTime=2; if route = 002 and timeP='S. after 6:30pm'

then numTime=S; if route = 017 and direct = 1 and timeP='S.

after 6:30' then numTime=S;

distance = 0 i if route = 001 and busnum = 001

then distance = 10.05;

I ,. busTime=O; if route = 001 and timeP='l. before 7am' and

direct = 2 then busTime = 46 i

"calcukaions: revKm=(distance*numTime);

opCost3=(revKm*2.48);

totalhrs refers to hours of service not including rest times. Initialize the variable "total hours".

If route equals I, the direction of the bus is outbound and if the departure time is before 7:00 am then totalhrs of running time is 0.77 hours.

The variable "time point" is less than 20 characters & alpha numeric.

This paragraph describes the grouping of the departure times into selected time points.

Initializes variable ''numTme''. "NumTime" refers to " of trips during selected time points.

Initializes "distance". Distance refers to the distance travelled within each trip, inbound and outbound. Distances were provided by the transit company.

Initializes "busTme".

revenue ldlometers was used to compute mileage during service

operating cost was used to compute cost of operating the buses. The ttansit company provided a base of $2.48 per kilometer.

** The following partZgraph deu:ribes the formatting of _ tlSaociated with eoeh atop. length stopname $ 30; if busnum = 001 and stop = 00 then stopname=' bus terminal'; if busnum = 002 and stop·= 00 then stopname='bus terminal'; if busnum = 001 and direct=l and stop = 01 then stopname='niagara@lakeshore'; if busnum = 122 and direct=2 and stop = 10 then stopname='SrockUniversity';

NESUG '96 Proceedings 576

Page 12: Posters - Lex Jansen · 2012. 9. 7. · V ARDEF= specifies the divisor to be used to calculate variances. DEPTH= specifies the maximum depth of any dimensions crossing. The default

** using SAS label statements label route = 'bus route'

direct = 'route direction' depart = 'departure time' tplabel1 = 'time point label l' tyrol = 'time at first time check';

** The following lines are examples of the tIoIIllllUllysed in this program cards; g 002 2 002 w 06:05 tpOO 99:99 tp06 06:05 tp2S 06: 14 tp32 06:21 tp44 06:27 0602cl. 1001 cl . 2801 tl. g 002 2 002 w 06:30 tpOO 99:99 tp06 99:99 tp2S 06:45 tp32 06:55 tp44 07:00 oo02t. 00 lOcI. 00 .. 06 0501 cl. 14 .. 02 2801 t.

** Thefollowing part of the program arranges the original data set proc sort data=bus2 noequals; by route day direct timeP;

Posters

data bus6; set bus2;

by route day direct timeP; retain numbSum 0;

Set assigns the working data set "bus2" to the active data set "bus6" organizes ''bus6'' for computations retain initialize the variables

retain xprod2 0; if first.timeP then numbSum=O; if first.timeP then xprod2=0;

SAS takes the first case for each variable within the first time point

** In the following line a running total is computed by initialiDng the revenue J1a1iable (xprot!2) and continuously qdding the wdues of the product of revenues (xprodI).

. xpro42+xprodl;

** The running total of fHlSsengen is numbSum+numb;

** operating cost is based on the following computation using indivUlual total hTs values. opCost2=(O.91*(totalHrs*60l);

** The revenue-cost ratio is computedjrom the division of estimlJted revenue by operating cost rate=xprod2/opCost2;

** gross cosl is based on the relationship between the operating cosl for total hours and the running totalfor fHlSsengen.

grCostl=opCost2/numbSum;

** average revenue is computedfrom the sum of estimlJted revenue divided by the running total of fHlSsengers averev=xprod2/numbSum;

** nel cosl is computed from the difference between gross cost and average revenue netCostl=grCostl-averev;

** revKM is the product of distance travelled and number of trips completed. revKm=distance*numTime;

577 NESUG '96 Proceedings

Page 13: Posters - Lex Jansen · 2012. 9. 7. · V ARDEF= specifies the divisor to be used to calculate variances. DEPTH= specifies the maximum depth of any dimensions crossing. The default

Posters

** a second operating cost is based on the revKM and a constont (2.48) opCost3=(revKm*2.48);

** a second revenue to cost ratW is computed from division of estimllted revenue and revKM based operating cost rate2=xprod2/opCost3; .

** a second gross cost is computed /rom division of the revKM based operating cost and TU1UIing totalfor passengers .

grCost2=opCost3/numbSum;

** a second net cost is computed/rom the difference between the revKM based gross cost and average revenue netCost2=grCost2-aveRev;

** number of passengers per rev hour is computed from the division between the running total of passengers and the total hours

pasRevhr=numbSum/totalHrs;

** number of passengers per rev km is computedfrom the division between the running total of passengers and the revenue 1dIkometers

numbRev=(numbSum/revKm)/

** estimated revenure per hour is computed from the division between the estimated revenue and the total houn: revHr=(xprod2/totalHrs);

.. amount of time per trip is computed by the amount of transit time and number of trips timeTrip=(busTime/numTime);

** data were printed according to route and/or day and/or direction and/or time point. proc sort data=bus6 noequals; by route day; proc print data=bus6; id route;

var xprod2 numbSum totalHrs opCost2 rate grCostl averev netCostl/ by route day; sum xprod2 numbSum totalHrs opCost2 rate grCostl averev netCostl;

title 'Bu~ study output for Financial info by total hrs';

• var xprod2 numbSum revKm opCost3 rate2 grCost2 averev netCost2/ by route day; sum xprod2 numbSum revKm opCost3 rate2 grCost2 averev netCost2;

title 'Bus study output for Financial info by rev km';

The results of this 5AS program are presented in the following sample tables. While the actual data presented are meaningless to individuals not associated with the transit service, these reports illustrate many critical features of the transit service as requested by the City of 5t. Catharines Transit Company.

NESUG '96 Proceedings 578

Page 14: Posters - Lex Jansen · 2012. 9. 7. · V ARDEF= specifies the divisor to be used to calculate variances. DEPTH= specifies the maximum depth of any dimensions crossing. The default

U1 -..J ID

z m rn c G')

cD en ." a I ::J

CQ (I)

Summary of ridership· by route by day

ROUTE=OO ·1n:~~·i$I·':~fF . number of .

Revenue

SAMPLE TABLES

~~~~~~~~44~~n~' '"1)0 00'·· .... 00

NO\\OF·· ··,~t~·),.:.

00

. passeng4}rs;.

,. ,.,ii:l·.lm~~,~:[ Pssgrs perKm

Route # day $0.00 00 00 ,--so.oo--I $0.00 00 00 $0.00 $0.00 00

Route No. of Passengers using Cash .. ··· ~.i •. ~··:·.rr~~~~~:~::::; ···.,;·.fi,\(:,:2.~(I.L~;·~h~~:C::~:.::~'.·"i,.;~!;I]i~r~3:.r.~:&i~ ..

IRoute#dayl 00 00 :J 00 00

Route Summary

,~$~A~~ti(.;

Route # day $0.00 00 $0.00 $0.00 1--$0.00 $0.00 $0.00 $0.00

Route Summary Revenue

r Route # day $0.00 00 00 $0.00 $0.00 $0.00 $0.00 $0.00

I