Programming in Perl regular expressions and m,s operators Peter Verhás January 2002.
Pattern Matching Operator
expression =~ m/regexp/options;
$a = "apple";
print "yes!" if $a =~ m/pp/;
The result is TRUE (1) or FALSE (0).
M operator options
• g global search• i case insensitive search• m multi-line string• s single line string• o evaluate once only• x extended regular expression
Now let’s see what Regular expression is and then we will return to m operator fine points.
Regular Expressions
• A regular expression is a string with joker characters and joker expressions.
• We will look at examples to explain it.
Regular Expression to Verify Email (1)
@mail = ( '[email protected]', 'hab.akukk%mikkamakka@jeno', );
for( @mail ){ if( /^.*\@\w+\..+$/ ){ print "$_ seems to be a good eMail\n"; }else{ print "$_ bad address\n"; } }OUTPUT:[email protected] seems to be a good eMailhab.akukk%mikkamakka@jeno bad address
NOTES:$_ is used as defaultm/ is default when / is used$_ =~ m/^.*@\w+\..+$/
@ would also work instead of \@ but \@ is safe
Regular Expression to Verify Email (2)
/^.*\@\w+\..+$/• ^ at the start of the string• .* zero or more any-character
– * means zero or more of what stands before
• \@ a single @ character• \w+ one or more alpha character
– + means one or more of what stands before
• \. one . (dot) character– special regexp character is escaped with \
• .+ one or more any character• $ until end of string
Search and Replace Example of Regular Expressions
$text = 'JavaScript is not used on island Java.';
$text =~ s/Java(?!Script)/Borneo/;
print $text;
OUTPUT:JavaScript is not used on island Borneo.
NOTES:Operator s will be dicussed later in detail(?! ) is zero length forward look, detailed later
Meta (joker) Character
• . any character but new line• ^ start of string• $ end of string• \ escaping the next character• \w any alpha character• \W any non-alpha character• \s any white space• \S any non-white space
Only examples, there are
other meta characters, see the Perl
manual.
Parentheses (1)
$text = 'Hook is not used on island Java.';$text =~ /(Ho(ok))\s(is?).*\3((l|s)(a|l))/;print "$1 $2 $3 $4 $5 $6\n";#$text = 'Hook i not used on island Java.';$text =~ /(Ho(ok))\s(is?).*\3((l|s)(a|l))/;print "$1 $2 $3 $4 $5 $6\n";
OUTPUT:Hook ok is la l aHook ok i sl s l NOTES:
Numbering is in the order of the opening parentheses
Parentheses without $n
$text = 'Hook is not used on island Java.';$text =~ /(Ho(ok))\s(is?).*\3((?:l|s)(a|l))/;print "$1 $2 $3 $4 $5 .$6.\n";$text = 'Hook i not used on island Java.';$text =~ /(Ho(ok))\s(is?).*\3((?:l|s)(a|l))/;print "$1 $2 $3 $4 $5 .$6.\n";
OUTPUT:Hook ok is la a ..Hook ok i sl l .. NOTES:
(?: ) groups sub-expression without creating reference
$6 is zero string
Character classes
• List of characters between [ and ]• Interval, e.g. [a-f]• Negative character set [^a-f]
Repetitions
• * zero or more times• + one or more times• ? zero or one time• {n} exactly n times• {n,} at least n times• {n,m} at least n times, at most m
times
NOTES:There is {n,} but there is
not {,m}
Why? (hint: {0,m} works, but {n,???}??)
Greedy repetition
• Repetitions are greedy, eat as many characters as possible
$text = 'Hook is not used on island Java.';$text =~ /(.*)is/; #1print "$1.\n";$text =~ /(.*?)is/; #2print "$1.\n";$text =~ /(.*?)is.*n/; #3print "$1.\n";
OUTPUT:Hook is not used on .Hook .Hook .
Other extensions
• Other UNIX tools also use simpler, similar regular expressions
• Perl regular expressions are more powerful
List of some extensions on the next slides
Regular expression comment
(?# comment comes here)
• Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments!
Regular Expression Parentheses
• (?: sub expression w/o $n)
(?: we have discussed it already beforehand as it came up in an example, but this is the proper
place to discuss this construct.)
Positive look forward
(?= subregexp)
$t = 'jamaica rum rum kingston rum';
$t =~ s/([aeoui])(?=\w)/uc($1)/ge;
print $t;
• OUTPUT:jAmAIca rUm rUm kIngstOn rUm
Example:Uppercase all vowels standing inside a word
to upper case.
Negative look forward
(?! subregexp)
$t = 'jamaica rum rum kingston rum';
$t =~ s/([aeoui])(?!\w)/uc($1)/ge;
print $t;
• OUTPUT:jamaicA rum rum kingston rum
Example:Uppercase all vowels standing end of a word
to upper case.
Option change inside the regular expression
(? imsx)• This can be used inside m/ or s/
operator.• i and g options can not be used
Now we go back to operator m/ and discuss some details.
M operator array result
@k = "abbabaa" =~ m/(bb).+(a.)/;
print $#k; print ' ',$k[0],' ',$k[1],"\n";
OUTPUT:1 bb aa
NOTES:Parts of the expression are closed into ( )$1, $2 ... are the default variables where the
substrings are put
M operator option g (1)
@k = "abbabaa" =~ m/(b)(a)/g;
print $#k,' ',$k[0],' ',$k[1],' ',$k[2],' ',$k[3],"\n";
OUTPUT:3 b a b a
NOTES:$_ is used as defaultm/ is default when / is used@ would also work instead of \@
but it is safe
M operator option g (2)
$t = "abbabaa";
while( $t =~ m/(ab)(b|a)/g ){
print pos($t)," $1 $2\n";
}
OUTPUT:3 ab b
6 ab a
M operator option i
• Case insensitive matchprint '.',"apple" =~ /AppLe/,".\n";
print '.',"apple" =~ /AppLe/i,".\n";
• prints..
.1.
M operator options m and s
$t = "mah\na\nb";while( $t =~ /(.?.)$/mg ){ print '.',$1; }print ".\n";while( $t =~ /(.?.)$/sg ){ print '.',$1; }print ".\n";while( $t =~ /(.?.)$/g ){ print '.',$1; }print ".\n";• OUTPUT:.ah.a.b..b..b.
m matches $ to all \n in the strings matches . to \n (otherwise . is any character but \n)
M operator option o
• Evaluate the regular expression only once to save processor
$t = "al brab";$a = 'al'; $b = 'rab';&q;&p;$b = 'fe';&q;&p;sub q { print ' q',$t =~ /$a\sb$b/o }sub p { print ' p',$t =~ /$a\sb$b/ }
• prints
q1 p1 q1 p
M operator option x
@k = "abbabaa" =~ m/(bb) #two or more 'b' gets into $1
.+ #one or more any-character
(a.) #a letter 'a' and exactly one any-character
/x; #space and comment allowed
print $#k;
print ' ',$k[0],' ',$k[1],"\n";
OUTPUT:1 bb aa
This option allows space (\ is space) and comments to ease readability.
Operator s
$text =~ s/regexp/replace/egimosx• Options:
– e replace is interpreted as expression– g global search and replace– i case insensitive search– m string is treated as multi-line – o regular expression is evaluated only once– s string is treated as single-line– x extended syntax for the regexp
Global Search and Replace
$t = "abbab" ;
$t =~ s/ab/aa/g;
print $t;OUTPUT:
aabaa replaces all occurrences of the search regular expression to the
replacement string
m and s operators with different delimiters
• / is the default, but you can use• ' to have non-interpolated string• Other non alphanumeric
characters• () {} [] with matching character
pairs– In this case s{search}{replace}
m and s operators with different delimiters example
$text = 'a@bba@bbabb';@b = ('bba');$text =~ s{@b}{q}g;print "$text\n";$text = 'a@bba@bbabb';$text =~ s'@b'q'g;print "$text\n";OUTPUT:a@q@qbbaqbaqbabb
@b is evaluated in the first search but not in the second