Top Banner
1 Using Google to Solve SAS ® Problems Michael Todd, Nth Analytics, Flemington, NJ ABSTRACT SAS ® has a long history as the language for statistical programming in the pharmaceutical industry. In the past 30+ years, most, if not all, of the problems statistical programmers face on a day-to-day basis have been solved by somebody. Usually there is a paper somewhere on the Internet about it. Using Google queries, statistical pro- grammers can leverage the vast SAS knowledge base on the Internet, and get quick answers to a variety of prob- lems. INTRODUCTION SAS has been around longer than 30 years. Most of the SAS programming problems have been worked out somewhere, by somebody. To solve a problem, it is often a matter of finding the solution online, rather than work- ing through the problem by yourself. You can greatly enhance your programming skills by leveraging the know- ledge of others. We all get used to doing things a certain way. Many problems can be solved using code already available at our companies. However, if new situations come up, there may not be anything to copy from. If new technologies get implemented, we probably will need some help just to get them working. Training does not cover all the situa- tions. We need an ongoing source of knowledge. With Google, there is always someone to ask for help. In this paper, I approach the programming issues from the viewpoint of a consultant who works for many organi- zations in the pharmaceutical industry, including Contract Research Organizations (CROs). These companies have many different ways of doing things. Usually, they are not anxious to change. As a ‘guest in the house’, you need to do things their way, and quickly. This can require quick mastery of unfamiliar techniques. I explore using Google to solve five very different programming problems, and explain my reasons for choosing the particular solution I did. Simulating the lag function in PROC SQL Methods of putting ‘Page x of y’ on RTF output Age in months calculation Confidence intervals for the median International encoding difficulties opening a dataset All of the solutions returned by the searches are valid. It is question of which solution is most easily implemented, cleanest, and best suited to a particular organization. GOOGLING SAS PROBLEMS: OVERVIEW There are five main sources of information available online for solving SAS problems: SAS documentation: a great source of answers to SAS problems. Documentation for Base SAS, all proce- dures, ODS, and most other topics are available at http://support.sas.com SAS books are available through Google’s controversial projects to make all books available online in a searchable format. This is handy when you are looking for a solution to a single problem, and don’t need the whole book. Social media: SAS Institute has a strong commitment to social media. Up-to-date resources, including Twit- ter, Facebook, and YouTube feeds are available at http://support.sas.com/community/socialmedia/index.html . SAS-L. One of the original social media sites, founded in 1985, and still active. It is a listserv, which is an email server. If someone posts an email, it goes to everyone on the list, and people can respond in real time. Many expert programmers have answered questions over the years. Emails from the archives are available Programming Beyond the Basics NESUG 2010
13

Using Google to Solve SAS Problems - Lex Jansen · SAS ODS RTF TOC RTF font code SAS RTF page x of y SAS ODS RTF page 1 of RTF escape character indent RTF \nofpages SAS ODS RTF ^page

Jul 13, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Using Google to Solve SAS Problems - Lex Jansen · SAS ODS RTF TOC RTF font code SAS RTF page x of y SAS ODS RTF page 1 of RTF escape character indent RTF \nofpages SAS ODS RTF ^page

1

Using Google to Solve SAS® Problems Michael Todd, Nth Analytics, Flemington, NJ

ABSTRACT SAS® has a long history as the language for statistical programming in the pharmaceutical industry. In the past 30+ years, most, if not all, of the problems statistical programmers face on a day-to-day basis have been solved by somebody. Usually there is a paper somewhere on the Internet about it. Using Google queries, statistical pro-grammers can leverage the vast SAS knowledge base on the Internet, and get quick answers to a variety of prob-lems.

INTRODUCTION SAS has been around longer than 30 years. Most of the SAS programming problems have been worked out somewhere, by somebody. To solve a problem, it is often a matter of finding the solution online, rather than work-ing through the problem by yourself. You can greatly enhance your programming skills by leveraging the know-ledge of others.

We all get used to doing things a certain way. Many problems can be solved using code already available at our companies. However, if new situations come up, there may not be anything to copy from. If new technologies get implemented, we probably will need some help just to get them working. Training does not cover all the situa-tions. We need an ongoing source of knowledge. With Google, there is always someone to ask for help.

In this paper, I approach the programming issues from the viewpoint of a consultant who works for many organi-zations in the pharmaceutical industry, including Contract Research Organizations (CROs). These companies have many different ways of doing things. Usually, they are not anxious to change. As a ‘guest in the house’, you need to do things their way, and quickly. This can require quick mastery of unfamiliar techniques.

I explore using Google to solve five very different programming problems, and explain my reasons for choosing the particular solution I did.

• Simulating the lag function in PROC SQL

• Methods of putting ‘Page x of y’ on RTF output

• Age in months calculation

• Confidence intervals for the median

• International encoding difficulties opening a dataset

All of the solutions returned by the searches are valid. It is question of which solution is most easily implemented, cleanest, and best suited to a particular organization.

GOOGLING SAS PROBLEMS: OVERVIEW There are five main sources of information available online for solving SAS problems:

• SAS documentation: a great source of answers to SAS problems. Documentation for Base SAS, all proce-dures, ODS, and most other topics are available at http://support.sas.com

• SAS books are available through Google’s controversial projects to make all books available online in a searchable format. This is handy when you are looking for a solution to a single problem, and don’t need the whole book.

• Social media: SAS Institute has a strong commitment to social media. Up-to-date resources, including Twit-ter, Facebook, and YouTube feeds are available at http://support.sas.com/community/socialmedia/index.html.

• SAS-L. One of the original social media sites, founded in 1985, and still active. It is a listserv, which is an email server. If someone posts an email, it goes to everyone on the list, and people can respond in real time. Many expert programmers have answered questions over the years. Emails from the archives are available

Programming Beyond the BasicsNESUG 2010

Page 2: Using Google to Solve SAS Problems - Lex Jansen · SAS ODS RTF TOC RTF font code SAS RTF page x of y SAS ODS RTF page 1 of RTF escape character indent RTF \nofpages SAS ODS RTF ^page

2

from 1996 to the present, and often contain targeted solutions to SAS problems. The web address is http://listserv.uga.edu/archives/sas-l.html.

• Conference papers are a major source of SAS programming solutions. These papers are well-reasoned, peer-reviewed, and tend to be focused on a single topic. They often have step by step, simple solutions where one can lift the code straight out and use it.

• http://www.lexjansen.com often comes up on Google searches. This site has a collection of over 10,000 SAS conference papers.

GOOGLE STORES YOUR SEARCHES Each time you search, Google stores what you searched and the result. It also stores the number of times you searched it. You will need an account, and to be logged in on your Google account when searching. This tends to happen automatically. If you forget how solved a problem, you can go to https://www.google.com/history, log in, and search your searches. As shown in Table 1, I often search PROC SQL. I have difficultly grasping some of the syntax. In the age of Google, this is not a major problem, because I can easily find the answer

Table 1 Most Frequent PROC SQL Searches PROC SQL noprint SAS PROC SQL correlated subquery PROC SQL into SAS SQL coalesce PROC SQL insert SAS SQL between operator SAS PROC SQL union syntax

PROC SQL except operator PROC SQL into separated by SAS PROC SQL intersect SAS PROC SQL begins with operator PROC SQL like operator SAS PROC SQL retain SAS PROC SQL row difference

SQL LAG FUNCTION PROBLEM In this section I show how to use PROC SQL to simulate the lag function. Why would I want to do this? I could easily use the DATA STEP. This type of problem comes up more than one would expect in on-site consulting. With the advent of SAS Enterprise Guide, some companies are phasing out the DATA STEP. It is possible to consult in a place where the DATA STEP is discouraged.

GOOGLE SEARCH: SAS PROC SQL LAG In this case, the specific task was to use PROC SQL to simulate the lag function to find the cutoff for the 25th per-centile. Figure 1 shows the results of the first search. In 2005, someone asked this question on SAS-L: “How do you do something similar to the lag function in a data step?” Figure 1

Programming Beyond the BasicsNESUG 2010

Page 3: Using Google to Solve SAS Problems - Lex Jansen · SAS ODS RTF TOC RTF font code SAS RTF page x of y SAS ODS RTF page 1 of RTF escape character indent RTF \nofpages SAS ODS RTF ^page

3

Opening the link, as seen in Figure 2, it looks like a potentially good solution. The code is clean, and looks rea-sonably simple and straightforward to implement. There is a subquery that I don’t totally understand, but looks like it will probably work.

Figure 2 (Schreier, 2005)

It worked. As shown in Figure 3, I changed the variable names to match those in my programs, and was able to get the answer I wanted, the value of the 25th percentile. This is an example of a successful search.

Figure 3

RTF SEARCHES RTF is another thing I have difficulty remembering. With Google, this is not an issue. I can always search and find the answer. Over time, I have learned to do more targeted searches, particularly when I have searched the prob-lem before. Table 2 shows the RTF topics I have searched:

Programming Beyond the BasicsNESUG 2010

Page 4: Using Google to Solve SAS Problems - Lex Jansen · SAS ODS RTF TOC RTF font code SAS RTF page x of y SAS ODS RTF page 1 of RTF escape character indent RTF \nofpages SAS ODS RTF ^page

4

Table 2 Most Frequent RTF Searches SAS RTF startpage SAS ODS PROC REPORT RTF image SAS RTF hidden text SAS RTF margins SAS ODS RTF text= SAS RTF tags SAS RTF different style header and body RTF underline SAS ODS RTF TOC

RTF font code SAS RTF page x of y SAS ODS RTF page 1 of RTF escape character indent RTF \nofpages SAS ODS RTF ^page SAS ODS RTF title J=L ODS RTF J=L J=R title on same line SAS ODS RTF escapechar ^left

The purpose of these searches is always to get the output to look exactly the way I want it. The need to do this tends to come up in the CRO industry. Clients may insist that output looks a certain way. However, I have also seen FDA reviewers make similar sorts of requests.

GOOGLE SEARCH: SAS RTF PAGE X OF Y One thing I have searched successfully is how to put ‘Page x of y’ (for example, Page 1 of 20, Page 2 of 20, etc.) on each page of the table or listing. I searched on SAS RTF PAGE X OF Y. The results are in Figure 4.

Figure 4

There has been a lot of work done on this issue. I got five very relevant results. In Google searches, usually the first result is the best, because it has the most links to it. In this case, it was a matter of deciding which of the five solutions was best.

Programming Beyond the BasicsNESUG 2010

Page 5: Using Google to Solve SAS Problems - Lex Jansen · SAS ODS RTF TOC RTF font code SAS RTF page x of y SAS ODS RTF page 1 of RTF escape character indent RTF \nofpages SAS ODS RTF ^page

5

SOLUTION 1: MACRO-BASED The first result was not suitable for my pur-poses. It is macro-based. Usually macro-based solutions are not good, because they present too many validation issues. I always look for a PROC, a function, or some really simple code.

For this problem, all I want to do is put Page x of y in my title. In Figure 5, the macro is not shown. Figure 5 just shows how to call the macro. This is complicated enough! To call the macro, you to put a CALL EXECUTE in-side a COMPUTE block and then pass the whole PROC REPORT in. This is clearly too complicated for what should be a fairly easy solution.

Figure 5 (Chung and Dunn, 2005)

SOLUTION 2: INTERESTING, BUT NOT PRACTICAL The solution shown in Figure 6 uses PROC TEMPLATE in a rather elegant manner. PROC TEMPLATE can apply styles to any report. Used as a global template, this method could put Page x of y on every report in the same place. The prob-lem was no one at this particu-lar organization used PROC TEMPLATE. Although it was the best solution, it was too much change for the organiza-tion, just to do Page x of y.

SOLUTION 3: OVERLY COMPLEX FOR THE PROBLEM The approach shown in Figure 7 is clearly way too compli-cated for this problem. It re-quires a lot of complex para-meter checking before even starting to implement ‘Page x of y’. Again, I am looking for ideally one line of code to solve this problem.

Figure 7 (SAS Institute, undated)

Figure 6 (SAS Institute, 2005)

Programming Beyond the BasicsNESUG 2010

Page 6: Using Google to Solve SAS Problems - Lex Jansen · SAS ODS RTF TOC RTF font code SAS RTF page x of y SAS ODS RTF page 1 of RTF escape character indent RTF \nofpages SAS ODS RTF ^page

6

SOLUTION 4: TOO OLD In Figure 8, the solution is perfectly valid. It is clean, easy to read, and requires only one line of code to implement. It is a way of embedding RTF field codes inside of SAS code. However, it mentions Word 97. That was a long time ago! I was hoping to find something a little newer.

Figure 8 (Tong, 2003)

SOLUTION 5: WINNER Figure 9 shows the best solution, given the particular organization that will use it. Al-though technically this requires an extra line of code to define the ODS ESCAPE-CHAR, this is as easy as it gets. Placing the ODS ESCAPECHAR in front of the automatic PAGEOF variable (available causes Page x of y to print on every page of the report. It is update-to-date, simple to read, simple to use, and very reliable.

Figure 9 (Mason, 2007)

AGE CALCULATION SEARCH Most pharmaceutical companies and CROs have a standard macro to compute age. This is the standard formula:

ageyear=floor((vdate - birthdt + 1) /365.25 );

Age in months in required for pediatric studies (infants < 2 years of age). We can easily extend the standard for-mula to return age in months by multiplying by 12.

agemos=floor((vdate - birthdt +1)*12 /365.25);

Examining this code, it is easy to see that it is an approximation. The formula for years adjusts for leap years by dividing by 365.25 instead of 365. The formula for months further approximates that by multiplying by 12, assum-ing all months have an equal number of days. For validation, it would be interesting to try a different method.

GOOGLE SEARCH: SAS AGE IN MONTHS This is another case where a lot of work has been done. As shown in Figure 10, there are several solutions to choose from.

Programming Beyond the BasicsNESUG 2010

Page 7: Using Google to Solve SAS Problems - Lex Jansen · SAS ODS RTF TOC RTF font code SAS RTF page x of y SAS ODS RTF page 1 of RTF escape character indent RTF \nofpages SAS ODS RTF ^page

7

In this case, the most authoritative answer would be from the SAS knowledge base on http://support.sas.com: CALCULATING AGE WITH ONLY ONE LINE OF CODE. That’s what I want. This article presents the standard solution under ‘what doesn’t work’. What they mean is this is an approximation, and does not take into account calendar months. For this, you either need some complicated programming, or the INTCHK function. INTCHK gives you the interval between any two dates, in days, weeks, months, etc. Here we use it for months

Substituting the variable names for my particular dataset, I used the following methods on a dataset with 3077 records from an integrated database of pediatric studies.

Production Method: floor((VDATE – BIRTHDT +1)*12 /365.25)

Validation Method: intck('month',BIRTHDT, VDATE) - (day(VDATE) < day(BIRTHDT))

Figure 10

Figure 11 (SAS Institute, 2004)

Programming Beyond the BasicsNESUG 2010

Page 8: Using Google to Solve SAS Problems - Lex Jansen · SAS ODS RTF TOC RTF font code SAS RTF page x of y SAS ODS RTF page 1 of RTF escape character indent RTF \nofpages SAS ODS RTF ^page

8

Comparing the results, 68 of 3077 age calculations (2%) did not match. If we look at the results in detail in Figure 12, subject 10194, born on December 29, would reach his 3rd month birthday on March 29. The standard method computes the age as 2 months old. For such young infants, this is a big discrepancy.

Figure 12

CONFIDENCE INTERVAL FOR A MEDIAN Another unusual request I received as a consultant was to compute the confidence interval for the median, as part of the descriptive statistics summary. I have been a statistician or programmer in the pharmaceutical industry since 1981, and I was asked to provide this exactly once. Fortunately, I had Google to rely on. Otherwise, I had no idea how to do this.

GOOGLE SEARCH 1: SAS CONFIDENCE INTERVAL MEDIAN Once again, a surprising amount had been published on this topic. Although I was completely unfamiliar with the problem, several people worked on it. The search results are shown in Figure 13:

Figure 13

Programming Beyond the BasicsNESUG 2010

Page 9: Using Google to Solve SAS Problems - Lex Jansen · SAS ODS RTF TOC RTF font code SAS RTF page x of y SAS ODS RTF page 1 of RTF escape character indent RTF \nofpages SAS ODS RTF ^page

9

FIRST RESULT The title of the first result looked interesting: CONFIDENCE INTERVALS IN THE ANALYSIS AND REPORTING OF CLINICAL TRIALS. Additionally, it was NESUG paper, and therefore likely to contain SAS code I could use. Based on a quick read, however, the article did not look promising. Figure 14 summarizes my initial reaction:

Figure 14 (Guangbin, 2003)

I should have read a little further. Two pages later, the code for the solution was there:

ODS OUTPUT QUANTILES; PROC UNIVARIATE CIPCTLDF …

Figure 15 (Guangbin, 2003)

FOLLOW-UP: MORE TARGETED SEARCH Instead of reading on, however, I tried another search. It turns out that I needed to include the words PROC UN-IVARIATE. In other words, I had to guess that if PROC UNIVARIATE computed the confidence interval for the mean, perhaps it also had an option to do so for the median. These results are shown in Figure 16:

Programming Beyond the BasicsNESUG 2010

Page 10: Using Google to Solve SAS Problems - Lex Jansen · SAS ODS RTF TOC RTF font code SAS RTF page x of y SAS ODS RTF page 1 of RTF escape character indent RTF \nofpages SAS ODS RTF ^page

10

The first result, a link to the PROC UNIVARIATE Version 9.2 documentation, is particularly useful. It has all of the options for computing both normal-approximation and distribution-free confidence intervals median, as well as a sample program. There is a wealth of easy-to-use information here, a portion of which is shown in Figure 17:

Figure 17 (SAS Institute, 2010 [2])

INTERNATIONAL ENCODING METHODS Companies often receive datasets created in different countries in using different encoding methods due to lan-guage differences. Encoding establishes the environment establishes the environment to process SAS syntax and to read and write SAS data sets. Encoding issues present difficult problems. First, without the right encoding, you cannot open the dataset. Secondly, these problems tend to be random. Therefore, the likelihood that some-one has solved your particular problem is unfortunately low.

In this next example, I describe using a Google search to eventually open a data from a Chinese affiliate. Figure 18 shows the error message we got in trying to the read the dataset:

Figure 16

Programming Beyond the BasicsNESUG 2010

Page 11: Using Google to Solve SAS Problems - Lex Jansen · SAS ODS RTF TOC RTF font code SAS RTF page x of y SAS ODS RTF page 1 of RTF escape character indent RTF \nofpages SAS ODS RTF ^page

11

The Chinese-speaking programmers contacted the group that created the dataset, but still could not resolve the issue. I then tried to solve the problem. Unfortunately I know very little about this area of SAS.

GOOGLE THE ERROR MESSAGE When you know nothing about the problem, sometimes Googling the error message itself works. I tried searching SAS SOME CHARACTER DATA WAS LOST DURING TRANSCODING. Unfortunately, the search returned a lot of technical manuals. These were aimed at IT administrators dealing with global deployments, and not at all helpful to me.

BROADER FOLLOW-UP SEARCH By this point, I was running out options. I next tried searching SAS CHINESE ENCODING. This returned some useful information; however, none of it worked.

Figure 19 (SAS Institute, 2003)

Figure 18

Programming Beyond the BasicsNESUG 2010

Page 12: Using Google to Solve SAS Problems - Lex Jansen · SAS ODS RTF TOC RTF font code SAS RTF page x of y SAS ODS RTF page 1 of RTF escape character indent RTF \nofpages SAS ODS RTF ^page

12

Suffice it to say after a lot of trial and error, I stumbled upon the solution. For some reason, using ASCIIANY as the encoding option opened the dataset.

Figure 20 (SAS Institute, 2010)

Why did ASCIIANY work, and the other options did not? The documentation in Figure 21 provides some of the answer, but honestly, I don’t know, and I really don’t care. I am glad to be done with this problem. It is unlikely to come up again in this particular form. Other encoding problems may well require different solutions. But this ex-ample is instructive in that shows how to use Google to solve weird SAS problems, even if you know very little about the issue.

Figure 21 (SAS Institute, 2010)

Programming Beyond the BasicsNESUG 2010

Page 13: Using Google to Solve SAS Problems - Lex Jansen · SAS ODS RTF TOC RTF font code SAS RTF page x of y SAS ODS RTF page 1 of RTF escape character indent RTF \nofpages SAS ODS RTF ^page

13

CONCLUSIONS In summary, you can leverage the experience of others by using Google searches to solve SAS problems. As the examples in this paper show, you can solve new problems you and really expand your skills. Using Google searches effectively requires some knowledge of SAS, and the ability to recognize the solution once you have found it. In general, I look for solutions from a reliable source (particularly support.sas.com), that are clear, con-cise, easy to implement, and do not require a lot of follow-up validation. These tend to be solutions based on PROCs or functions.

REFERENCES Chung, Chang Y and Dunn, Toby (2005), “Page X of Y with Proc Report”, Paper CC31, Proceedings of the Pharmaceutical Industry SAS® Users Group Conference 2005, http://www.lexjansen.com/pharmasug/2005/CodersCorner/.%5CCC31.pdf

Mason, Phil (2007), “SAS 9 Tips · Part II”, MeasureIT, Issue 5.11, Computer Measurement Group, November, 2007, http://www.cmg.org/measureit/issues/mit46/m_46_1.html

Peng, Guangbin (2003), “Confidence Intervals in Analysis and Reporting of Clinical Trials”, Proceedings of the Pharmaceutical Industry SAS® Users Group Conference 2003, http://www.lexjansen.com/pharmasug/2003/statisticspharmacokinetics/sp050.pdf

SAS Institute, Inc. (2003), “TS-691: SAS® Encoding Values, IANA Preferred MIME Charset, Java™ and Oracle® Encoding Names”, Knowledge Base, Papers, SAS Technical Papers, http://support.sas.com/techsup/technote/ts691.pdf

SAS Institute Inc. (2004), “Sample 24808: Calculating Age with Only One Line of Code”, Knowledge Base, Sam-ples & SAS Notes, http://support.sas.com/kb/24/808.html

SAS Institute Inc. (2006), “Usage Note 15727: Writing PAGE X OF Y in RTF does not work with BODYTITLE”, Knowledge Base, Samples & SAS Notes, http://support.sas.com/kb/15/727.html

SAS Institute, Inc. (2010), “ENCODING= Data Set Option”, SAS(R) 9.2 National Language Support (NLS): Refer-ence Guide, http://support.sas.com/documentation/cdl/en/nlsref/61893/HTML/default/viewer.htm#/documentation/cdl/en/nlsref/61893/HTML/default/a002601944.htm

SAS Institute Inc. (2010)[2], “PROC UNIVARIATE: SAS: Example 4.10 Computing Confidence Limits for Quan-tiles and Percentiles”, Base SAS(R) 9.2 Procedures Guide: Statistical Procedures, Third Edition, http://support.sas.com/documentation/cdl/en/procstat/63104/HTML/default/viewer.htm#/documentation/cdl/en/procstat/63104/HTML/default/procstat_univariate_sect065.htm

SAS Institute Inc. (undated), “Pageof macros (.sas)”, Knowledge Base, Focus Areas, Base SAS, ODS PDF, Arc-hive: SAS 8.2 ODS PRINTER Family, http://support.sas.com/rnd/base/ods/odsprinter/pageofpp_public.sas

Schreier, Howard (2005), “Re: In proc SQL, how to do something similar to lag function in data step?”, SAS-L posting, 30 August 2005, http://listserv.uga.edu/cgi-bin/wa?A2=ind0508e&L=sas-l&D=0&P=18631

Tong, Cindy (2003), “ODS RTF: Practical Tips”, Proceedings of the Northeast SAS® Users Group Conference, 2003, http://www.nesug.org/proceedings/nesug03/at/at007.pdf

ACKNOWLEDGMENTS SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Other brand and product names are registered trademarks or trademarks of their respective companies.

CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at:

Michael Todd Work Phone: 908.672.5649 Nth Analytics Fax: 253.595.7413 12 Crimson King Trail Email: [email protected] Flemington, NJ 08822 Web: www.nthanalytics.com

Programming Beyond the BasicsNESUG 2010