Pattern Matching: Simple Patterns
Post on 31-Dec-2015
50 Views
Preview:
DESCRIPTION
Transcript
Introduction
• Programmers often need to scan a file, directory, etc. for a specific substring.– Find all files that begin with “A”.– Find all files that end in “txt”
• This capability is provided by a variety of tools.– e.g. egrep, grep, awk,
• Useful to include this functionality in a programming language.
Perl’s Pattern Matcher
• Perl has a built in pattern matcher.– Motivation: system administrators frequently
use regular expressions. They also use Perl.
• Syntax is borrowed from the grep utility in Unix.
• Based on regular expressions from computer science.
Perl’s Pattern Matcher (cont.)
• Operates over a single string.• Contexts:
– Scalar: Returns true or false.
– List: Matching substrings returned in a list.
• The syntax is:m dl pattern dl [modifiers]
• (/) is the most common delimiter.– m operator is unnecessary.
• Other delimiters can be used:m~pattern~
Simple Patterns
• Simple patterns – match individual characters or character classes.
• An abstract representation of a set of strings.
• A pattern “matches” when the string it’s compared with is in the set.
• Matching is done from left to right.
Three Categories of Characters
• Normal characters:– Match themselves.– Includes escape characters – e.g. \t, \cC
• Metacharacters:– Have special meanings in patterns– \ | ( ) [ ] { } ^ $ * +
• Period:– Matches any character except newline.
An Example
$_ = “It’s snowing today.”;
if (/snow/) {print “There was snow somewhere in $_”;
}else {
print “$_ was snowless \n”;}
Character Classes
• Character classes specify collections of characters in patterns.
• Defined by placing the set in [ ]– e.g. /[<>=]
• Dashes are used specify ranges of characters:– /[A-Za-z]/– /[0-7]/– /[0-3-]/
Exclusion From a Class
• Characters can be excluded from a class with (^)
• Matches anything except the specified characters.
• For example:– /[^A-Za-z]/– /[^01]/
Useful Abbreviations
Abbreviation Pattern Matches
\d [0-9] A digit
\D [^0-9] A nondigit
\w [A-Za-z_] A word char
\W [^A-Za-z_] A nonword char
\s [ \r\t\n\f] A white-space char
\S [^ \r\t\n\f] A non-white-space char
Variables in Patterns
• A variable in a pattern is interpolated.
• For example,$hexpat = “\\s[\dA-Fa-f]\\s”;
if (/$hexpat/) {
print “$_ has a hex digit.”
}
Quantifiers
• Quantifiers can make a pattern more powerful.
• Allows a pattern to be repeated a specified number of times.
• Perl has four kinds of quantifier:– *, +, ?, {m, n}
• Quantifier immediately follows the pattern it quantifies.
{m, n}
• {n} – exactly n repetitions.
• {m,} – at least m repetitions.
• {m,n} – at least m, but not more than n repetitions.
{m,n} Examples
• /a{1,3}b/ - ab, aab, aaab
• /ab{3}c/ - abbbc
• /ab{2,}c/ - abbc, abbbc, abbbbc, …
• /c{3} z{5}/ - ccc zzzzz
• /[abc] {1, 2}/ - a,b,c,ab,ac,ba,bc,ca,cb
Asterisk (*)
• (*) means zero or more repetitions.
• Equivalent to {0,}
• For example,– /0\d\d*/– /\w\w*/– /bob.*cat/
Plus (+)
• (+) means one or more repetitions.
• Equivalent to {1,}
• For example,– /\w+/– /[A-Za-z][A-Za-z\d_]+/– /\d+\.\d+/
Question Mark (?)
• (?) means either zero or one.
• Equivalent to {0,1}.
• For example,– /\d+\.?/– /\$?\d+\.\d\d/– /”?\w+”?/
Subpatterns
• Quantifiers modify only the last character.– e.g. /ball*/
• () can be used to group parts of patterns.
• The quantifier modifies the group.
• For example,– /(ball)*/– /(boo! ){3}/
Alternation
• (|) is the logical OR operator in a pattern.
• /a|e|i|o|u/ is equivalent to /[aeiou]/
• For example,– /(Bob|Tom|Pussy|Scaredy)cat/– /t(oo?|wo)/
• Be careful!– /Tom|Tommie/
top related