Top Banner
Regular Expressions Regular Expressions – SAS® (RX) P l (PRX) SAS® (RX) P l (PRX) SAS® (RX) vs. Perl (PRX) SAS® (RX) vs. Perl (PRX) Mark Tabladillo Ph.D. Mark Tabladillo Ph.D. April 10, 2005 April 10, 2005 April 10, 2005 April 10, 2005 © 2005, markTab Consulting, All Rights Reserved
36

Regular Expressions -- SAS and Perl

Jun 12, 2015

Download

Technology

Mark Tabladillo

The SAS System provides two declarative syntax languages for regular expressions: SAS and Perl. This presentation compares and contrasts these two complementary choices for SAS application developers.
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Regular Expressions -- SAS and Perl

Regular Expressions Regular Expressions ––SAS® (RX) P l (PRX)SAS® (RX) P l (PRX)SAS® (RX) vs. Perl (PRX)SAS® (RX) vs. Perl (PRX)

Mark Tabladillo Ph.D.Mark Tabladillo Ph.D.April 10, 2005April 10, 2005April 10, 2005April 10, 2005

© 2005, markTab Consulting, All Rights Reserved

Page 2: Regular Expressions -- SAS and Perl

MotivationMotivationMotivationMotivation

The SAS System Version 9 introduces PerlThe SAS System Version 9 introduces PerlThe SAS System Version 9 introduces Perl The SAS System Version 9 introduces Perl regular expressions (PRX)regular expressions (PRX)Earlier software versions already had SASEarlier software versions already had SASEarlier software versions already had SAS Earlier software versions already had SAS regular expressions (RX)regular expressions (RX)

© 2005, markTab Consulting, All Rights Reserved

Page 3: Regular Expressions -- SAS and Perl

PurposePurposePurposePurpose

This presentation will compare andThis presentation will compare andThis presentation will compare and This presentation will compare and contrast the two types of regular contrast the two types of regular expressions (RX and PRX) from both theexpressions (RX and PRX) from both theexpressions (RX and PRX) from both the expressions (RX and PRX) from both the functionality and performance viewpointsfunctionality and performance viewpointsThe goal: Offer recommendations onThe goal: Offer recommendations onThe goal: Offer recommendations on The goal: Offer recommendations on when to use the two typeswhen to use the two typesA li ti T i l illA li ti T i l illApplication: Two generic examples will Application: Two generic examples will illustrate the recommended strategyillustrate the recommended strategy

© 2005, markTab Consulting, All Rights Reserved

Page 4: Regular Expressions -- SAS and Perl

OutlineOutlineOutlineOutline

BackgroundBackgroundBackgroundBackgroundSimilarities between SAS (RX) and Perl Similarities between SAS (RX) and Perl Regular Expressions (PRX)Regular Expressions (PRX)Regular Expressions (PRX)Regular Expressions (PRX)Unique Perl Regular Expression (PRX) Unique Perl Regular Expression (PRX) C bilitiC bilitiCapabilitiesCapabilitiesRecommended Strategy for SAS (RX) and Recommended Strategy for SAS (RX) and Perl Regular Expressions (PRX)Perl Regular Expressions (PRX)Two Examples of Recommended StrategyTwo Examples of Recommended Strategy

© 2005, markTab Consulting, All Rights Reserved

p gyp gy

Page 5: Regular Expressions -- SAS and Perl

OutlineOutlineOutlineOutline

BackgroundBackgroundBackgroundBackgroundSimilarities between SAS (RX) and Perl Similarities between SAS (RX) and Perl Regular Expressions (PRX)Regular Expressions (PRX)Regular Expressions (PRX)Regular Expressions (PRX)Unique Perl Regular Expression (PRX) Unique Perl Regular Expression (PRX) C bilitiC bilitiCapabilitiesCapabilitiesRecommended Strategy for SAS (RX) and Recommended Strategy for SAS (RX) and Perl Regular Expressions (PRX)Perl Regular Expressions (PRX)Two Examples of Recommended StrategyTwo Examples of Recommended Strategy

© 2005, markTab Consulting, All Rights Reserved

p gyp gy

Page 6: Regular Expressions -- SAS and Perl

VocabularyVocabularyVocabularyVocabularyPattern matching Pattern matching enables you to search for and enables you to search for and gg yyextract multiple matching patterns from a character extract multiple matching patterns from a character string in one step, as well as to make several string in one step, as well as to make several substitutions in a string in one stepsubstitutions in a string in one stepg pg pRegular expressions Regular expressions are a pattern language which are a pattern language which provides fast tools for parsing large amounts of text.provides fast tools for parsing large amounts of text.MetacharactersMetacharacters are special combinations ofare special combinations ofMetacharactersMetacharacters are special combinations of are special combinations of alphanumeric and/or symbolic characters which have alphanumeric and/or symbolic characters which have specific meaning in defining a regular expression.specific meaning in defining a regular expression.Ch t lCh t l i l bi ti fi l bi ti fCharacter classes Character classes are single or combinations of are single or combinations of alphanumeric and/or symbolic characters which alphanumeric and/or symbolic characters which represent themselves.represent themselves.

© 2005, markTab Consulting, All Rights Reserved

Page 7: Regular Expressions -- SAS and Perl

Is “One Step” Realistic?Is “One Step” Realistic?Is One Step Realistic?Is One Step Realistic?

Practical uses of regular expressions usePractical uses of regular expressions usePractical uses of regular expressions use Practical uses of regular expressions use more than one stepmore than one stepRegular expressions provide a powerfulRegular expressions provide a powerfulRegular expressions provide a powerful Regular expressions provide a powerful parsimonious syntax for string parsimonious syntax for string manipulationmanipulationmanipulationmanipulation

© 2005, markTab Consulting, All Rights Reserved

Page 8: Regular Expressions -- SAS and Perl

When to Use Regular ExpressionsWhen to Use Regular ExpressionsWhen to Use Regular ExpressionsWhen to Use Regular Expressions

Anything done in regular expressionsAnything done in regular expressionsAnything done in regular expressions Anything done in regular expressions could be coded another waycould be coded another wayMany people do not use metacharacters inMany people do not use metacharacters inMany people do not use metacharacters in Many people do not use metacharacters in (for example) Google® searches(for example) Google® searchesHi hHi h l l t i il l t i iHighHigh--volume or complex string processing volume or complex string processing (such as in a data step) provides excellent (such as in a data step) provides excellent

t ti lt ti lpotentialpotential

© 2005, markTab Consulting, All Rights Reserved

Page 9: Regular Expressions -- SAS and Perl

Why Regular Expressions can be Why Regular Expressions can be C f iC f iConfusingConfusing

Regular expressions are a combination of:Regular expressions are a combination of:Regular expressions are a combination of:Regular expressions are a combination of:–– Alphanumeric and/or symbolic characters Alphanumeric and/or symbolic characters

representing themselves (representing themselves (character classescharacter classes))–– Special combinations of alphanumeric and/or Special combinations of alphanumeric and/or

symbolic characters (symbolic characters (metacharactersmetacharacters) representing ) representing zero or more combinations of alphanumeric and/orzero or more combinations of alphanumeric and/orzero or more combinations of alphanumeric and/or zero or more combinations of alphanumeric and/or symbolic characterssymbolic characters

–– Specially flagged combinations of alphanumeric Specially flagged combinations of alphanumeric and/or symbolic characters which would normally be and/or symbolic characters which would normally be interpreted as metacharacters, but instead represent interpreted as metacharacters, but instead represent themselves (themselves (character classescharacter classes))

© 2005, markTab Consulting, All Rights Reserved

themselves (themselves (character classescharacter classes))

Page 10: Regular Expressions -- SAS and Perl

OutlineOutlineOutlineOutline

BackgroundBackgroundBackgroundBackgroundSimilarities between SAS (RX) and Perl Similarities between SAS (RX) and Perl Regular Expressions (PRX)Regular Expressions (PRX)Regular Expressions (PRX)Regular Expressions (PRX)Unique Perl Regular Expression (PRX) Unique Perl Regular Expression (PRX) C bilitiC bilitiCapabilitiesCapabilitiesRecommended Strategy for SAS (RX) and Recommended Strategy for SAS (RX) and Perl Regular Expressions (PRX)Perl Regular Expressions (PRX)Two Examples of Recommended StrategyTwo Examples of Recommended Strategy

© 2005, markTab Consulting, All Rights Reserved

p gyp gy

Page 11: Regular Expressions -- SAS and Perl

Similarity One: Parse FunctionSimilarity One: Parse FunctionSimilarity One: Parse FunctionSimilarity One: Parse Function

PARSE is the core function of creating aPARSE is the core function of creating aPARSE is the core function of creating a PARSE is the core function of creating a regular expression in memory using regular expression in memory using metacharacters, and assigning this regular metacharacters, and assigning this regular , g g g, g g gexpression to a numeric SAS variable, expression to a numeric SAS variable, called the called the regular expression IDregular expression ID. . The term ID refers to identification, and The term ID refers to identification, and SAS will assign every PARSE function to a SAS will assign every PARSE function to a diff t d i i l ddiff t d i i l ddifferent and unique numeric value, and different and unique numeric value, and track those values automatically.track those values automatically.

© 2005, markTab Consulting, All Rights Reserved

Page 12: Regular Expressions -- SAS and Perl

Similarity One: Parse FunctionSimilarity One: Parse FunctionSimilarity One: Parse FunctionSimilarity One: Parse Function

The programming challenge is to create aThe programming challenge is to create aThe programming challenge is to create a The programming challenge is to create a regular expression which generically regular expression which generically describes a character string patterndescribes a character string patterndescribes a character string patterndescribes a character string patternMetacharacters for SAS (RX) and Perl Metacharacters for SAS (RX) and Perl (PRX) regular expressions are usually(PRX) regular expressions are usually(PRX) regular expressions are usually (PRX) regular expressions are usually different, but either method can be used different, but either method can be used to create a similar if not identical resultto create a similar if not identical resultto create a similar if not identical resultto create a similar if not identical result

© 2005, markTab Consulting, All Rights Reserved

Page 13: Regular Expressions -- SAS and Perl

Similarity One: ExampleSimilarity One: ExampleSimilarity One: ExampleSimilarity One: Example

In this first example (SAS Institute, 2003), the In this first example (SAS Institute, 2003), the t s st e a p e (S S st tute, 003), t et s st e a p e (S S st tute, 003), t egoal is to find a pattern that matches (XXX) XXXgoal is to find a pattern that matches (XXX) XXX--XXXX or XXXXXXX or XXX--XXXXXX--XXXX for phone numbers in XXXX for phone numbers in the United Statesthe United Statesthe United States. the United States. –– The first three digits are the area code, and by The first three digits are the area code, and by

standardized rules, the area code cannot start with a standardized rules, the area code cannot start with a zero or a one. zero or a one.

–– The fourth through sixth digits are the prefix, and The fourth through sixth digits are the prefix, and again by standard rules, the prefix also cannot startagain by standard rules, the prefix also cannot startagain by standard rules, the prefix also cannot start again by standard rules, the prefix also cannot start with a zero or one. with a zero or one.

–– The suffix may have any digit, including zero or one, The suffix may have any digit, including zero or one, in any of the four placesin any of the four places

© 2005, markTab Consulting, All Rights Reserved

in any of the four places.in any of the four places.

Page 14: Regular Expressions -- SAS and Perl

Phone Number: Perl (PRX)Phone Number: Perl (PRX)Phone Number: Perl (PRX)Phone Number: Perl (PRX)

paren = "paren = "\\([2([2--9]9]\\dd\\dd\\) ?[2) ?[2--9]9]\\dd\\dd--paren = paren = \\([2([2 9]9]\\dd\\dd\\) ?[2) ?[2 9]9]\\dd\\dd\\dd\\dd\\dd\\d";d";dash = "[2dash = "[2 9]9]\\dd\\dd [2[2 9]9]\\dd\\dd \\dd\\dd\\dd\\d";d";dash = [2dash = [2--9]9]\\dd\\dd--[2[2--9]9]\\dd\\dd--\\dd\\dd\\dd\\d ;d ;regexp = "/(" || paren || ")|(" || dash || regexp = "/(" || paren || ")|(" || dash || ")/"")/"")/";")/";See the Paper for the full code and See the Paper for the full code and explanationexplanation

© 2005, markTab Consulting, All Rights Reserved

Page 15: Regular Expressions -- SAS and Perl

Phone Number: SAS (RX)Phone Number: SAS (RX)Phone Number: SAS (RX)Phone Number: SAS (RX)

paren = "'('$'2paren = "'('$'2--9'$d$d')'[' ']$'29'$d$d')'[' ']$'2--9'$d$d'9'$d$d'--paren = ( $ 2paren = ( $ 2 9 $d$d ) [ ]$ 29 $d$d ) [ ]$ 2 9 $d$d9 $d$d'$d$d$d$d";'$d$d$d$d";dash = "$'2dash = "$'2 9'$d$d'9'$d$d' '$'2'$'2 9'$d$d'9'$d$d'dash = $ 2dash = $ 2--9 $d$d9 $d$d -- $ 2$ 2--9 $d$d9 $d$d --'$d$d$d$d";'$d$d$d$d";

|| "|" || d h|| "|" || d hregexp = paren || "|" || dash;regexp = paren || "|" || dash;See the Paper for the full code and See the Paper for the full code and explanationexplanation

© 2005, markTab Consulting, All Rights Reserved

Page 16: Regular Expressions -- SAS and Perl

Comparing the MethodsComparing the MethodsComparing the MethodsComparing the Methods

A SAS Macro was created to compare theA SAS Macro was created to compare theA SAS Macro was created to compare the A SAS Macro was created to compare the methodsmethodsOne iteration did not show a difference soOne iteration did not show a difference soOne iteration did not show a difference, so One iteration did not show a difference, so the iterations were increased to 500the iterations were increased to 500SAS (RX) i t 3 69 d dSAS (RX) i t 3 69 d dSAS (RX) wins at 3.69 seconds compared SAS (RX) wins at 3.69 seconds compared to Perl (PRX) at 3.80 secondsto Perl (PRX) at 3.80 secondsPoint: If speed is an issue, you may try Point: If speed is an issue, you may try the two methods to see who winsthe two methods to see who wins

© 2005, markTab Consulting, All Rights Reserved

Page 17: Regular Expressions -- SAS and Perl

Similarity Two: MatchingSimilarity Two: MatchingSimilarity Two: MatchingSimilarity Two: Matching

The matching function uses the regularThe matching function uses the regularThe matching function uses the regular The matching function uses the regular expression to determine a specific numeric expression to determine a specific numeric position in a stringposition in a stringposition in a stringposition in a stringThe return from a match function is a The return from a match function is a number representing a character positionnumber representing a character positionnumber representing a character positionnumber representing a character position

© 2005, markTab Consulting, All Rights Reserved

Page 18: Regular Expressions -- SAS and Perl

Similarity Three: SubstringSimilarity Three: SubstringSimilarity Three: SubstringSimilarity Three: Substring

The substring routine allows for inputtingThe substring routine allows for inputtingThe substring routine allows for inputting The substring routine allows for inputting a regular expression and string, and a regular expression and string, and outputting a position and lengthoutputting a position and lengthoutputting a position and lengthoutputting a position and lengthRoutines (unlike functions) can have Routines (unlike functions) can have variable numbers of inputs and outputsvariable numbers of inputs and outputsvariable numbers of inputs and outputs, variable numbers of inputs and outputs, as in the substring routineas in the substring routine

© 2005, markTab Consulting, All Rights Reserved

Page 19: Regular Expressions -- SAS and Perl

Similarity Four: ChangeSimilarity Four: ChangeSimilarity Four: ChangeSimilarity Four: Change

The change routine allows for inputting aThe change routine allows for inputting aThe change routine allows for inputting a The change routine allows for inputting a regular expression, a maximum number of regular expression, a maximum number of times to replace an old string andtimes to replace an old string andtimes to replace, an old string, and times to replace, an old string, and outputs a new stringoutputs a new stringBoth SAS (RX) and Perl (PRX) allow forBoth SAS (RX) and Perl (PRX) allow forBoth SAS (RX) and Perl (PRX) allow for Both SAS (RX) and Perl (PRX) allow for changing a string in placechanging a string in place

© 2005, markTab Consulting, All Rights Reserved

Page 20: Regular Expressions -- SAS and Perl

Similarity Five: FreeSimilarity Five: FreeSimilarity Five: FreeSimilarity Five: Free

The free routine releases the memoryThe free routine releases the memoryThe free routine releases the memory The free routine releases the memory allocation for the regular expressionallocation for the regular expressionIt is recommended to always include aIt is recommended to always include aIt is recommended to always include a It is recommended to always include a FREE routine to prevent problemsFREE routine to prevent problems

© 2005, markTab Consulting, All Rights Reserved

Page 21: Regular Expressions -- SAS and Perl

OutlineOutlineOutlineOutline

BackgroundBackgroundBackgroundBackgroundSimilarities between SAS (RX) and Perl Similarities between SAS (RX) and Perl Regular Expressions (PRX)Regular Expressions (PRX)Regular Expressions (PRX)Regular Expressions (PRX)Unique Perl Regular Expression (PRX) Unique Perl Regular Expression (PRX) C bilitiC bilitiCapabilitiesCapabilitiesRecommended Strategy for SAS (RX) and Recommended Strategy for SAS (RX) and Perl Regular Expressions (PRX)Perl Regular Expressions (PRX)Two Examples of Recommended StrategyTwo Examples of Recommended Strategy

© 2005, markTab Consulting, All Rights Reserved

p gyp gy

Page 22: Regular Expressions -- SAS and Perl

Capture BuffersCapture BuffersCapture BuffersCapture Buffers

Perl (PRX) regular expressions can usePerl (PRX) regular expressions can usePerl (PRX) regular expressions can use Perl (PRX) regular expressions can use capture buffers, defined as part of a capture buffers, defined as part of a match explicitly specified in the Perl match explicitly specified in the Perl p y pp y pregular expressionregular expressionThe capture buffers are collectively a oneThe capture buffers are collectively a one--p yp ydimensional numbered array of results dimensional numbered array of results (starting at one, not zero)(starting at one, not zero)Example: Parts of a phone numberExample: Parts of a phone numberMore than one step is requiredMore than one step is required

© 2005, markTab Consulting, All Rights Reserved

p qp q

Page 23: Regular Expressions -- SAS and Perl

Unique Feature One: PRXPOSN Unique Feature One: PRXPOSN iiRoutineRoutine

The PRXPOSN routine finds the startThe PRXPOSN routine finds the startThe PRXPOSN routine finds the start The PRXPOSN routine finds the start position and length of a numbered capture position and length of a numbered capture bufferbufferbufferbuffer

© 2005, markTab Consulting, All Rights Reserved

Page 24: Regular Expressions -- SAS and Perl

Unique Feature Two: PRXPOSN Unique Feature Two: PRXPOSN iiFunctionFunction

The PRXPOSN Function uses the positionalThe PRXPOSN Function uses the positionalThe PRXPOSN Function uses the positional The PRXPOSN Function uses the positional capture buffer number to return the actual capture buffer number to return the actual string in the capture bufferstring in the capture bufferstring in the capture bufferstring in the capture bufferThis function is probably more useful than This function is probably more useful than the PRXPOSN routinethe PRXPOSN routinethe PRXPOSN routinethe PRXPOSN routine

© 2005, markTab Consulting, All Rights Reserved

Page 25: Regular Expressions -- SAS and Perl

Unique Feature Three: PRXPARENUnique Feature Three: PRXPARENUnique Feature Three: PRXPARENUnique Feature Three: PRXPAREN

The PRXPAREN function assumes that theThe PRXPAREN function assumes that theThe PRXPAREN function assumes that the The PRXPAREN function assumes that the capture buffer was an ordered hierarchical capture buffer was an ordered hierarchical array and will return the highest nonarray and will return the highest non--array, and will return the highest nonarray, and will return the highest nonmissing capture buffer numbermissing capture buffer numberSee the paper for an exampleSee the paper for an exampleSee the paper for an exampleSee the paper for an example

© 2005, markTab Consulting, All Rights Reserved

Page 26: Regular Expressions -- SAS and Perl

Unique Feature Four: PRXNEXTUnique Feature Four: PRXNEXTUnique Feature Four: PRXNEXTUnique Feature Four: PRXNEXT

Similar to PRXMATCH the PRXNEXTSimilar to PRXMATCH the PRXNEXTSimilar to PRXMATCH, the PRXNEXT Similar to PRXMATCH, the PRXNEXT routine will iteratively search a string for routine will iteratively search a string for matchesmatchesmatchesmatchesNot based on the capture bufferNot based on the capture bufferU f l h t i h lti lU f l h t i h lti lUseful when a string can have multiple, Useful when a string can have multiple, even overlapping, matcheseven overlapping, matches

© 2005, markTab Consulting, All Rights Reserved

Page 27: Regular Expressions -- SAS and Perl

Unique Feature Five: PRXDEBUGUnique Feature Five: PRXDEBUGUnique Feature Five: PRXDEBUGUnique Feature Five: PRXDEBUG

The PRXDEBUG routine writes debuggingThe PRXDEBUG routine writes debuggingThe PRXDEBUG routine writes debugging The PRXDEBUG routine writes debugging messages to the logmessages to the logProvides insight into how regularProvides insight into how regularProvides insight into how regular Provides insight into how regular expression functions and routines search expression functions and routines search through specific stringsthrough specific stringsthrough specific stringsthrough specific stringsDebugging works best when smaller Debugging works best when smaller i h k d fi t b ildi t di h k d fi t b ildi t dpieces are checked first, building toward pieces are checked first, building toward

the whole regular expressionthe whole regular expression

© 2005, markTab Consulting, All Rights Reserved

Page 28: Regular Expressions -- SAS and Perl

OutlineOutlineOutlineOutline

BackgroundBackgroundBackgroundBackgroundSimilarities between SAS (RX) and Perl Similarities between SAS (RX) and Perl Regular Expressions (PRX)Regular Expressions (PRX)Regular Expressions (PRX)Regular Expressions (PRX)Unique Perl Regular Expression (PRX) Unique Perl Regular Expression (PRX) C bilitiC bilitiCapabilitiesCapabilitiesRecommended Strategy for SAS (RX) and Recommended Strategy for SAS (RX) and Perl Regular Expressions (PRX)Perl Regular Expressions (PRX)Two Examples of Recommended StrategyTwo Examples of Recommended Strategy

© 2005, markTab Consulting, All Rights Reserved

p gyp gy

Page 29: Regular Expressions -- SAS and Perl

Recommended StrategyRecommended StrategyRecommended StrategyRecommended Strategy

Use the type which has the desiredUse the type which has the desiredUse the type which has the desired Use the type which has the desired functionalityfunctionalityIf you don’t know either start with PerlIf you don’t know either start with PerlIf you don t know either, start with Perl If you don t know either, start with Perl regular expressions (PRX)regular expressions (PRX)If l ki t fIf l ki t fIf you are looking at performance or If you are looking at performance or speed issues, try tests both ways (RX and speed issues, try tests both ways (RX and PRX)PRX)PRX)PRX)

© 2005, markTab Consulting, All Rights Reserved

Page 30: Regular Expressions -- SAS and Perl

OutlineOutlineOutlineOutline

BackgroundBackgroundBackgroundBackgroundSimilarities between SAS (RX) and Perl Similarities between SAS (RX) and Perl Regular Expressions (PRX)Regular Expressions (PRX)Regular Expressions (PRX)Regular Expressions (PRX)Unique Perl Regular Expression (PRX) Unique Perl Regular Expression (PRX) C bilitiC bilitiCapabilitiesCapabilitiesRecommended Strategy for SAS (RX) and Recommended Strategy for SAS (RX) and Perl Regular Expressions (PRX)Perl Regular Expressions (PRX)Two Examples of Recommended StrategyTwo Examples of Recommended Strategy

© 2005, markTab Consulting, All Rights Reserved

p gyp gy

Page 31: Regular Expressions -- SAS and Perl

Example One: Printer NamesExample One: Printer NamesExample One: Printer NamesExample One: Printer Names

The Universal Naming ConventionThe Universal Naming ConventionThe Universal Naming Convention The Universal Naming Convention describes printers as:describes printers as:\\\\computer namecomputer name\\printer shared nameprinter shared name\\\\computer_namecomputer_name\\printer_shared_nameprinter_shared_nameThe SYSPRINT option returns or sets the The SYSPRINT option returns or sets the UNC printer nameUNC printer nameUNC printer nameUNC printer name

© 2005, markTab Consulting, All Rights Reserved

Page 32: Regular Expressions -- SAS and Perl

Example One: Printer NameExample One: Printer NameExample One: Printer NameExample One: Printer Name

Problem: A variety of legal UNC formats:Problem: A variety of legal UNC formats:Problem: A variety of legal UNC formats:Problem: A variety of legal UNC formats:–– \\\\computer_namecomputer_name\\printer_shared_nameprinter_shared_name

((\\\\computer namecomputer name\\printer shared nameprinter shared name))–– ((\\\\computer_namecomputer_name\\printer_shared_nameprinter_shared_name))–– (“(“\\\\computer_namecomputer_name\\printer_shared_nameprinter_shared_name’)’)

12 i t * 3 f t 36 bi ti12 i t * 3 f t 36 bi ti12 printers * 3 formats = 36 combinations12 printers * 3 formats = 36 combinationsSAS (RX) could be used with 3 separate SAS (RX) could be used with 3 separate regular expressionsregular expressionsPerl (PRX) capture buffer usedPerl (PRX) capture buffer used

© 2005, markTab Consulting, All Rights Reserved

( ) p( ) p

Page 33: Regular Expressions -- SAS and Perl

Example One: PRXExample One: PRXExample One: PRXExample One: PRX

'/('/(\\\\\\\\[[--\\\\\\w]+|[w]+|[--\\w]+)/'w]+)/'/(/(\\\\\\\\[[ \\\\\\w]+|[w]+|[ \\w]+)/ w]+)/ The regular expression will extract the The regular expression will extract the printer name without the braces orprinter name without the braces orprinter name, without the braces, or printer name, without the braces, or brackets, or quotation marksbrackets, or quotation marksS th f l tiS th f l tiSee the paper for explanationSee the paper for explanation

© 2005, markTab Consulting, All Rights Reserved

Page 34: Regular Expressions -- SAS and Perl

Example Two: Windows Example Two: Windows S bdiS bdiSubdirectorySubdirectory

Get the subdirectory from the longerGet the subdirectory from the longerGet the subdirectory from the longer Get the subdirectory from the longer string which started with the drive name string which started with the drive name and ended with a specific filename:and ended with a specific filename:and ended with a specific filename:and ended with a specific filename:–– X:X:\\\\Sub_Directory_1Sub_Directory_1\\Sub_Directory_2Sub_Directory_2\\......\\SubSub

Directory NDirectory N\\Filename ExtensionFilename Extension_Directory_N_Directory_N\\Filename.ExtensionFilename.Extension

As in the previous example, the original As in the previous example, the original string includes the backslash which is astring includes the backslash which is astring includes the backslash, which is a string includes the backslash, which is a Perl delimiting metacharacterPerl delimiting metacharacter

© 2005, markTab Consulting, All Rights Reserved

Page 35: Regular Expressions -- SAS and Perl

Example Two: Regular ExpressionExample Two: Regular ExpressionExample Two: Regular ExpressionExample Two: Regular Expression

'/([A'/([A--ZaZa--z]:[z]:[ --\\\\\\w]+)w]+)\\\\([([ --\\w]+)w]+)\\\\([([ --/([A/([A ZaZa z]:[.z]:[. \\\\\\w]+)w]+)\\\\([.([. \\w]+)w]+)\\\\([.([.\\w]+)/' w]+)/' The regular expression creates threeThe regular expression creates threeThe regular expression creates three The regular expression creates three capture buffers, with the second capture capture buffers, with the second capture buffer containing the string of interestbuffer containing the string of interestbuffer containing the string of interestbuffer containing the string of interestSee the paper for a full explanationSee the paper for a full explanation

© 2005, markTab Consulting, All Rights Reserved

Page 36: Regular Expressions -- SAS and Perl

ConclusionConclusionConclusionConclusion

With version 9 SAS programmers haveWith version 9 SAS programmers haveWith version 9, SAS programmers have With version 9, SAS programmers have two regular expression choices: SAS (RX) two regular expression choices: SAS (RX) and Perl (PRX)and Perl (PRX)and Perl (PRX)and Perl (PRX)The presentation described similarities and The presentation described similarities and differences and offered a recommendeddifferences and offered a recommendeddifferences, and offered a recommended differences, and offered a recommended strategystrategyTh t i th d t il dTh t i th d t il dThe paper contains three detailed The paper contains three detailed examples, and an annotated bibliographyexamples, and an annotated bibliography

© 2005, markTab Consulting, All Rights Reserved