Regular expressions (contd.) -- remembering subpattern matches • When a <pattern> is being matched with a target string, substrings that match sub-patterns can be remembered and re-used later in the same pattern • Sub-patterns whose matching substrings are to be remembered are enclosed in parentheses • The sub-patterns are implicitly numbered, starting from 1 and their matching substrings can then be re-used later in the pattern by using back-references like \1 or \2 or \3 • However, to get the backslash, we need to escape it, so we must type \\1 or \\2 or \\3 in our regular expressions
36
Embed
Regular expressions (contd.) -- remembering subpattern matches When a is being matched with a target string, substrings that match sub-patterns can be.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
• When a <pattern> is being matched with a target string, substrings that match sub-patterns can be remembered and re-used later in the same pattern
• Sub-patterns whose matching substrings are to be remembered are enclosed in parentheses
• The sub-patterns are implicitly numbered, starting from 1 and their matching substrings can then be re-used later in the pattern by using back-references like \1 or \2 or \3
• However, to get the backslash, we need to escape it, so we must type \\1 or \\2 or \\3 in our regular expressions
Using back-references (contd.)• PHP code
<?php
$myString1 = ”klmAklmAAklmABklmBklmBBklm";echo "myString is $myString <br>";
Regular expressions(contd.) -- using subpattern matches in replacements
• We saw that, within a regular expression, substrings that matched sub-patterns can be re-used later in the pattern by preceding the appropriate integer with a pair of backslashes, \\
• Within a <replacement>, substrings that matched sub-patterns in the regular expressioncan be used by preceding the appropriate integer with a dollar $
Using sub-pattern matches in replacements (contd.)• PHP code
<?php$myString = "<p>This is paragraph 1.</p><p>This is paragraph 2.</p>";
echo "myString is ".str_replace("<","<",$myString)." <br>";
echo "myString is now ".str_replace("<","<",$myString);
?>• Resultant output ismyString is <ol><li>fred</li><li>tom</li></ol>
myString is now <li>fred</li><li>tom</li>
• Suppose we wanted to remove all pairs of HTML tags. That is, suppose we wanted
myString is <ol><li>fred</li><li>tom</li></ol>
myString is now fredtom
• How would we achieve that?
Using regexps to process nested HTML (contd.)• Would a frugal quantifier do the trick? PHP code:<?php$myString = “<ol><li>fred</li><li>tom</li></ol>";echo "myString is ".str_replace("<","<",$myString)."
echo "myString is now ".str_replace("<","<",$myString);?>• No. The resultant output is stillmyString is <ol><li>fred</li><li>tom</li></ol>
myString is now <li>fred</li><li>tom</li>
• The reason is that, while preg_replace does replace all matching substrings in the target substring, it does not perform replacement operations on the replacement string
• The value <li>fred</li><li>tom</li> above is the result of a replacement operation, so it is not modified
• However, suppose we wanted to remove all pairs, no matter how deep the nesting. How would we do that?
Using regexps to process nested HTML (contd.)• We must use repetition to attack the nested instances<?php
$myString = “<ol><li>fred</li><li>tom</li></ol>";
echo "myString is ".str_replace("<","<",$myString)." <br>";
• The resultant output is nowmyString is <ol><li>fred</li><li>tom</li></ol>
myString is now <li>fred</li><li>tom</li>
myString is now fred tom
More on regular expressions – checking for context
• All the preg_replace operations we have written so far have consumed all the characters that matched the regular expression
• There was no notion of examining the context surrounding the consumed characters– any characters that were matched were consumed
• We often need some way of matching characters without removing them from the target string
• There four meta-expression for doing this, two for forward context and two for backward context
Look-ahead context checks
(?=regexp)
This is a positive lookahead context check
It matches characters in the target string against the pattern specified by the embedded regular expression regexp without consuming them from the target string
• Example
preg_replace(“/\w+(?= cat)/”,”_”,$myString)
This replaces with an underscore any word that is followed by a space and the word cat, without removing the space or the word cat from the target string
• An example application is on the next slide
Look-ahead checks (contd.)
• Program fragment:$myString = "tabby is a big cat. fido is a fat dog.";
• Output producedmyString is Fred is a cowboy. Dolly is a cow.
myString is now Fred is a cowboy. Dolly is a _.
Look-behind context checks
(?<=regexp)
This is a positive look-behind context check
It ensures that preceding characters in the target string match the pattern specified by the embedded regular expression regexp
• Example
preg_replace(“/(?<= cow)boy/”,”girl”,$myString)
This replaces all sub-strings “boy” with “girl”, provided these sub-strings are preceded by the sub-string “cow”, but the sub-string “cow” is not consumed.
Look-ahead checks (contd.)
• Program fragment:$myString = “Fred is a cowboy. Tom is a boy.";
• Output producedmyString is Fred is a cowboy. Tom is a boy.
myString is now Fred is a cowboy. Tom is a girl.
Regexp pattern modifiers
• We have seen that a regexp is of the form /…../ where the slash characters are delimiters (and could be replaced by other non-alphanumeric printable characters
• The terminating character can be followed by a sequence of modifiers which affect the meaning of the regexp between the delimiting characters
Example pattern modifier: the caseless match modifier
• Program fragment:$myString = "Fred is a boy. Tom is a BOY.";