More PHP functions for using regular expressions • Up to now, we have seen just one of the library of functions which PHP provides for using regular expressions • The full library is described in Chapter CVIII of the PHP Manual • We will consider just two more of them – preg_match and – preg_match_all
47
Embed
More PHP functions for using regular expressions Up to now, we have seen just one of the library of functions which PHP provides for using regular expressions.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
More PHP functions for using regular expressions
• Up to now, we have seen just one of the library of functions which PHP provides for using regular expressions
• The full library is described in Chapter CVIII of the PHP Manual
• We will consider just two more of them– preg_match and– preg_match_all
preg_match
• The format of a call to this function isint preg_match ( string pattern, string subject
[, array &matches [, int flags [, int offset]]] )
• As can be seen, only the first two arguments are required, so a minimum-argument call is of the form
int preg_match ( string regexp, string subject)
which returns0 if the regexp is not matched inside the subject string, or
1 if the regexp is matched inside the string
• PHP code<?php
$document = "<h1>France</h1>
<p>Foods of France:
<ol><li>wine</li><li>bread</li></ol></p>";
$regexp = "%<p>.+</p>%";
if ( preg_match($regexp,$document) )
{ echo "Yes"; }
else { echo "No"; }
?>
• OutputNo
• Why no match?• Answer: on next slide
• We need to make the dot match newlines
• Revised PHP code<?php
$document = "<h1>France</h1>
<p>Foods of France:
<ol><li>wine</li><li>bread</li></ol></p>";
$regexp = "%<p>.+</p>%s";
if ( preg_match($regexp,$document) )
{ echo "Yes"; }
else { echo "No"; }
?>
• OutputYes
preg_match (contd.)
• Frequently, it is useful to use the third, optional, argument
int preg_match ( string pattern, string subject, array &matches )
• As before, this returns o or 1 depending on whether a match was found in the subject string
• However, in addition, elements of the array in the third argument are set to match parts of the matching substring of the target string– matches[0] is set to contain the whole substring– matches[1] is set to contain the first () substring– matches[2] is set to contain the second () substring– … etc
• PHP code<?php$document = "<h1>France</h1> <p>Foods of France: <ol><li>wine</li><li>bread</li></ol></p>";echo "document is ".str_replace("<","<",$document)." <br>";$regexp = " %<p>(.+)</p>%s ";if ( preg_match($regexp,$document,$matches) ) { echo "Yes <br>"; echo "matches[0] is ".str_replace("<","<",$matches[0])."<br>"; echo "matches[1] is ".str_replace("<","<",$matches[1])."<br>"; } else { echo "No"; }?>
• Outputdocument is <h1>France</h1> <p>Foods of France:
<ol><li>wine</li><li>bread</li></ol></p> Yes matches[0] is <p>Foods of France: <ol><li>wine</li><li>bread</li></ol></p>matches[1] is Foods of France: <ol><li>wine</li><li>bread</li></ol>
preg_match_all
• So why would we need a function calledpreg_match_all
• See next slide
• PHP code<?php$document = "<p> This is paragraph 1. </p> <p> And this is paragraph 2.</p>";echo "document is ".str_replace("<","<",$document)." <br>";$regexp = "%<p>(.+?)</p>%s";if ( preg_match($regexp,$document,$matches) ) { echo "Yes <br>"; echo "matches[0] is ".str_replace("<","<",$matches[0])."<br>"; echo "matches[1] is ".str_replace("<","<",$matches[1])."<br>"; } else { echo "No"; }?>
• Outputdocument is <p>This is paragraph 1.</p> <p>And this is paragraph 2.</p> Yes matches[0] is <p>This is paragraph 1.</p>matches[1] is This is paragraph 1.
• That is, preg_match only finds the first match
preg_match_all
• preg_match_all is like preg_match except – that it finds all matches and– thus, the value returned in $matches is actually an array of
• $matches[0] is an array of all the substrings which match the overall regular expression
• $matches[1] is an array of all the substrings which match the first parenthesised sub-expression
• $matches[2] is an array of all the substrings which match the second parenthesised sub-expression
• and so on
• PHP code<?php$document = "<p>This is paragraph 1.</p> <p>And this is paragraph 2.</p><p>Paragraph 3.</p>";echo str_replace("<","<",$document)." <br>";$regexp = "%<p>(.+?)</p>%s";if ( preg_match_all($regexp,$document,$matches) ) {$numMatches = count($matches[0]); for ($i=0;$i < $numMatches; $i++) {echo "matches[0][$i] is ".str_replace("<","<",$matches[0][$i])."<br>"; } for ($i=0;$i < $numMatches; $i++) {echo "matches[1][$i] is ".str_replace("<","<",$matches[1][$i])."<br>"; } } else { echo "No"; }?>
• Output<p>This is paragraph 1.</p> <p>And this is paragraph 2.</p> <p>Paragraph 3.</p>matches[0][0] is <p>This is paragraph 1.</p>matches[0][1] is <p>And this is paragraph 2.</p>matches[0][2] is <p>Paragraph 3.</p>matches[1][0] is This is paragraph 1.matches[1][1] is And this is paragraph 2.matches[1][2] is Paragraph 3.
PHP Filesystem Functions
• Chapter XXXVIII of the PHP manual
• We will consider just four• resource fopen ( string filename, string mode [, bool use_include_path [,
resource zcontext]] )
Usually used as
resource fopen ( string filename, string mode)
• bool fclose ( resource handle )
• string fread ( resource handle, int length )
• int fwrite ( resource handle, string someString [, int length] )
Usually used as
int fwrite (resource handle, string someString )
fopen• Typical call format:
resource fopen ( string filename, string mode)
• Example calls
$fileHandle1 = fopen("names.txt","w");
opens, for writing, a file called "names.txt" in the same directory
as the PHP program
$fileHandle1 = fopen("names.txt","r");
opens, for reading, a file called "names.txt" in the same directory
writes the string <h1>Blah blahM/h1> into the file with handle
$fileHandle1 and returns the number of bytes written into the file or
returns 0 (FALSE) if there was an error
fclose• Typical call format:
string fclose ( resource handle)
• Example call
fclose($fileHandle1);
closes the file with handle $fileHandle1
Example usage• PHP code:
<?php
$rte = fopen("http://www.rte.ie/","r");
$contents = fread($rte,100000000);
fclose($rte);
echo str_replace("<","<",$contents);
?>
• Output:
<html> <head> <META http-equiv="Content-Type" content="text/html; charset=iso-8859-15"> <title>RTE.ie - Irish Public Service TV and radio stations online</title> <META name="Description" content="RTE.ie - Irish public service television and radio broadcaster on the World Wide Web - bringing you Irish news, sports, business, entertainment, weather, television and radio, programmes, current affairs, health, motors, travel, video and audio."> <META name="Keywords" content="rte, rte.ie, irish, television, radio, ireland, Irish, news, business, sport, results, news, Ireland, video, audio, broadcaster, irish"> <STYLE TYPE="text/css"><!-- A {text-decoration: none; color: #000000} A:hover {text-decoration: none; color: #660000} --></STYLE> <STYLE TYPE="text/css"> <!-- FORM {display:inline;} --></STYLE> <script language="JavaScript"> <!-- function AertelPage() { var
Compare output of program with page seen in browser
<html> <head> <META http-equiv="Content-Type" content="text/html; charset=iso-8859-15"> <title>RTE.ie - Irish Public Service TV and radio stations online</title> <META name="Description" content="RTE.ie - Irish public service television and radio broadcaster on the World Wide Web - bringing you Irish news, sports, business, entertainment
Example application• Extracting output from website of The Guardian:
– The Guardian maintains a page, updated almost daily, of recent stories on Israel and the Middle East at
http://www.guardian.co.uk/israel– Its appearance on 25 October 2005 is on the next
slide– The first part of the the source code for the page,
gotten from a browser, is on the slide after that– The complete source code for the 25 October 2005
version of the page is in the file http://www.cs.ucc.ie/j.bowen/cs4408/resources/GuardianIsrael25October2005.txt
(Note that this is an exact copy of the page from the Guardian site, so the src and href attribute values in the file assume that the page is stored at the Guardian URL above.)
• We want to extract the text stories, as shown in the third-next slide
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<!-- artifact_id=377264, built 2005-10-25 11:00 -->
<p><span class="mainlink"><a HREF="/israel/Story/0,2763,1599840,00.html">Israel still in control of Gaza, says envoy</a></span><br /><b>October 25: </b>The international Middle East envoy, James Wolfensohn, has accused Israel of behaving as if it has not withdrawn from the Gaza Strip, by blocking its borders and failing to fulfil commitments to allow the movement of Palestinians and goods.
</p>
<b>Qur'an test</b><hr size="1">
Here is the source code around the headline Audio reports
<p><span class="mainlink"><a HREF="/israel/Story/0,2763,1596052,00.html">Israel's closed zone</a></span><br /><b>October 20, letters: </b>You graphically highlight the continuing expansionism of the Israeli government (Report, October 18).
</p>
<b>Audio reports</b><hr size="1">
<p><span class="mainlink"><a HREF="http://stream.guardian.co.uk:7080/ramgen/sys-audio/Guardian/audio/2005/09/12/120905McGreal.ra">Palestinians rush to Gaza</a></span><br /><b>September 12:</b> Many descended on the former Jewish Gaza settlements intent on causing chaos, but others came simply to see the beach for the first time, reports <b>Chris McGreal</b> from Khan Yunis. (2min 33s)
This PHP program will extract all the source code between the two headlines
But we also want to remove all the intermediate headlines and rulings between the stories
Wolfensohn, has accused Israel of behaving as if it has not withdrawn from the Gaza Strip, by blocking its borders and failing to fulfil commitments to allow the movement of Palestinians and goods.
</p>
<b>Qur'an test</b><hr size="1">
<p><span class="mainlink"><a HREF="/israel/Story/0,2763,1599227,00.html">Qur'an competition tests participants' memories</a></span><br /><b>October 24: </b>With senior militant leaders looking on, Palestinian officials opened an international competition yesterday testing participants' knowledge of the Qur'an.
</p>
<b>Comment and analysis</b><hr size="1">
<p><span class="mainlink"><a HREF="/israel/Story/0,2763,1596291,00.html">Christian leanings at the Jerusalem Post</a></span><br /><b>October 20, Chris McGreal:</b> The strange and uneasy embrace between the Jewish state and America's evangelical right is being tightened.
<br />
<a HREF="/israel/comment/0,10551,1590082,00.html">12.10.05, Jonathan Freedland: One and three-quarter state solution</a><br />
<a HREF="/israel/Story/0,2763,1584308,00.html">04.10.05, Chris McGreal: House that became a war zone</a></p>
<b>West Bank</b><hr size="1">
<p><span class="mainlink"><a HREF="/israel/Story/0,2763,1596168,00.html">Israel accused of 'road apartheid' in West Bank</a></span><br /><b>October 20: </b>Army seals off main route to Palestinian vehicles <br><b>· </b>Opponents say plan is to carve out new borders.
<br />
This program will extract stories and remove all intermediate headlines and rulings between stories<?php