Top Banner
1 Perl Tutorial Practical extraction and report lan http://www.comp.leeds.ac.uk/Perl/start
54

1 Perl Tutorial Practical extraction and report language .

Dec 20, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Perl Tutorial Practical extraction and report language .

1

Perl TutorialPractical extraction and report language

http://www.comp.leeds.ac.uk/Perl/start.html

Page 2: 1 Perl Tutorial Practical extraction and report language .

2

Why Perl?

Perl is built around regular expressions REs are good for string processing Therefore Perl is a good scripting language Perl is especially popular for CGI scripts

Perl makes full use of the power of UNIX Short Perl programs can be very short

“Perl is designed to make the easy jobs easy, without making the difficult jobs impossible.” -- Larry Wall, Programming Perl

Page 3: 1 Perl Tutorial Practical extraction and report language .

3

Why not Perl?

Perl is very UNIX-oriented Perl is available on other platforms... ...but isn’t always fully implemented there However, Perl is often the best way to get some

UNIX capabilities on less capable platforms Perl does not scale well to large programs

Weak subroutines, heavy use of global variables Perl’s syntax is not particularly appealing

Page 4: 1 Perl Tutorial Practical extraction and report language .

4

What is a scripting language?

Operating systems can do many things copy, move, create, delete, compare files execute programs, including compilers schedule activities, monitor processes, etc.

A command-line interface gives you access to these functions, but only one at a time

A scripting language is a “wrapper” language that integrates OS functions

Page 5: 1 Perl Tutorial Practical extraction and report language .

5

Major scripting languages

UNIX has sh, Perl Macintosh has AppleScript, Frontier Windows has no major scripting languages

probably due to the weaknesses of DOS Generic scripting languages include:

Perl (most popular) Tcl (easiest for beginners) Python (new, Java-like, best for large programs)

Page 6: 1 Perl Tutorial Practical extraction and report language .

6

Perl Example 1

#!/usr/local/bin/perl## Program to do the obvious#print 'Hello world.'; # Print a message

Page 7: 1 Perl Tutorial Practical extraction and report language .

7

Comments on “Hello, World”

Comments are # to end of line But the first line, #!/usr/local/bin/perl, tells where to

find the Perl compiler on your system

Perl statements end with semicolons Perl is case-sensitive Perl is compiled and run in a single operation

Page 8: 1 Perl Tutorial Practical extraction and report language .

8

Variables A variable is a name of a place where some information is stored. For

example: $yearOfBirth = 1976; $currentYear = 2000; $age = $currentYear-$yearOfBirth; print $age; Same name can store strings: $yearOfBirth = ‘None of your business’;

The variables in the example program can be identified as such because their names start with a dollar ($). Perl uses different prefix characters for structure names in programs. Here is an overview:

$: variable containing scalar values such as a number or a string @: variable containing a list with numeric keys %: variable containing a list with strings as keys &: subroutine

Page 9: 1 Perl Tutorial Practical extraction and report language .

9

Operations on numbers Perl contains the following arithmetic operators: +: sum -: subtraction *: product /: division %: modulo division **: exponent

Apart from these operators, Perl contains some built-in arithmetic functions. Some of these are mentioned in the following list:

abs($x): absolute value int($x): integer part rand(): random number between 0 and 1 sqrt($x): square root

Page 10: 1 Perl Tutorial Practical extraction and report language .

10

Test your understanding

$text =~ s/bug/feature/;

$text =~ s/bug/feature/g;

$text =~ tr/[A-Z]/[a-z]/;

$text =~ tr/AEIOUaeiou//d;

$text =~ tr/[0-9]/x/cs;

$text =~ s/[A-Z]/CAPS/g;

Page 11: 1 Perl Tutorial Practical extraction and report language .

11

Examples # replace first occurrence of "bug" $text =~ s/bug/feature/;

# replace all occurrences of "bug" $text =~ s/bug/feature/g;

# convert to lower case $text =~ tr/[A-Z]/[a-z]/;

# delete vowels $text =~ tr/AEIOUaeiou//d;

# replace nonnumber sequences with a single x $text =~ tr/[0-9]/x/cs;

# replace each capital character by CAPS

$text =~ s/[A-Z]/CAPS/g;

Page 12: 1 Perl Tutorial Practical extraction and report language .

12

Regular expressions \b: word boundaries \d: digits \n: newline \r: carriage return \s: white space characters \t: tab \w: alphanumeric characters ^: beginning of string $: end of string .: any character [bdkp]: characters b, d, k and p [a-f]: characters a to f [^a-f]: all characters except a to f abc|def: string abc or string def

*: zero or more times +: one or more times ?: zero or one time {p,q}: at least p times and at most q times {p,}: at least p times {p}: exactly p times

Examples:1. Clean an HTML formatted text

2. Grab URLs from a Web page

3. Transform all lines from a file into lower case

Page 13: 1 Perl Tutorial Practical extraction and report language .

13

Lists and arrays @a = (); # empty list

@b = (1,2,3); # three numbers

@c = ("Jan","Piet","Marie"); # three strings

@d = ("Dirk",1.92,46,"20-03-1977"); # a mixed list

Variables and sublists are interpolated in a list @b = ($a,$a+1,$a+2); # variable interpolation @c = ("Jan",("Piet","Marie")); # list interpolation @d = ("Dirk",1.92,46,(),"20-03-1977"); # empty list # don’t get lists containing lists – just a simple list

@e = ( @b, @c ); # same as (1,2,3,"Jan","Piet","Marie")

Page 14: 1 Perl Tutorial Practical extraction and report language .

14

Lists and arrays

Practical construction operators ($x..$y) @x = (1..6); # same as (1, 2, 3, 4, 5, 6)

@z = (2..5,8,11..13); # same as (2,3,4,5,8,11,12,13)

qw() "quote word" function

qw(Jan Piet Marie) is a shorter notation for ("Jan","Piet","Marie").

Page 15: 1 Perl Tutorial Practical extraction and report language .

15

Split

It takes a regular expression and a string, and splits the string into a list, breaking it into pieces at places where the regular expression matches. 

$string = "Jan Piet\nMarie \tDirk";@list = split /\s+/, $string; # yields ( "Jan","Piet","Marie","Dirk" ) # remember \s is a white space

$string = " Jan Piet\nMarie \tDirk\n"; # empty string at begin and end!!!@list = split /\s+/, $string; # yields ( "", "Jan","Piet","Marie","Dirk", "" )

$string = "Jan:Piet;Marie---Dirk"; # use any regular expression... @list = split /[:;]|---/, $string; # yields ( "Jan","Piet","Marie","Dirk" )

$string = "Jan Piet"; # use an empty regular expression to split on letters 

@letters= split //, $string; # yields ( "J","a","n"," ","P","i","e","t")

Page 16: 1 Perl Tutorial Practical extraction and report language .

16

More about arrays @array = ("an","bert","cindy","dirk");

$length = @array; # $length now has the value 4

print $length; # prints 4

print $#array; # prints 3, last valid subscript

print $array[$#array] # prints "dirk"

print scalar(@array) # prints 4

Page 17: 1 Perl Tutorial Practical extraction and report language .

17

Working with listsSubscripts convert lists to strings@array = ("an","bert","cindy","dirk"); print "The array contains $array[0] $array[1] $array[2] $array[3]";

# interpolateprint "The array contains @array";

function join STRING LIST. $string = join ":", @array; # $string now has the value "an:bert:cindy:dirk"

Iteration over listsfor( $i=0 ; $i<=$#array; $i++){ $item = $array[$i]; $item =~ tr/a-z/A-Z/; print "$item "; }

foreach $item (@array){ $item =~ tr/a-z/A-Z/; print "$item "; # prints a capitalized version of each item

}

Page 18: 1 Perl Tutorial Practical extraction and report language .

18

More about arrays – multiple value assignments

($a, $b) = ("one","two"); ($onething, @manythings) = (1,2,3,4,5,6) # now $onething equals 1 # and @manythings = (2,3,4,5,6) ($array[0],$array[1]) = ($array[1],$array[0]); # swap the first two

Pay attention to the fact that assignment to a variable first evaluates the right hand-side of the expression, and then makes a copy of the result

@array = ("an","bert","cindy","dirk"); @copyarray = @array; # makes a deep copy $copyarray[2] = "XXXXX";

Page 19: 1 Perl Tutorial Practical extraction and report language .

19

Manipulating lists and their elements PUSH

push ARRAY LIST

appends the list to the end of the array. if the second argument is a scalar rather than a list, it appends it as the last item of

the array.

@array = ("an","bert","cindy","dirk"); @brray = ("eve","frank");

push @array, @brray; # @array is ("an","bert","cindy","dirk","eve","frank")

push @brray, "gerben"; # @brray is ("eve","frank","gerben")

Page 20: 1 Perl Tutorial Practical extraction and report language .

20

Manipulating lists and their elements POP

pop ARRAY does the opposite of push. it removes the last item of its argument list and returns it.

If the list is empty it returns undef. @array = ("an","bert","cindy","dirk"); $item = pop @array;

# $item is "dirk" and @array is ( "an","bert","cindy")

shift @array removes the first element - works on the left end of the list, but is otherwise the same as pop.

unshift (@array, @newStuff) puts stuff on the left side of the list, just as push does for

the right side.

Page 21: 1 Perl Tutorial Practical extraction and report language .

21

Grep

grep CONDITION LIST

returns a list of all items from list that satisfy some condition.

For example:

@large = grep $_ > 10, (1,2,4,8,16,25); # returns (16,25)

@i_names = grep /i/, @array; # returns ("cindy","dirk")

Page 22: 1 Perl Tutorial Practical extraction and report language .

22

map map OPERATION LIST is an extension of grep, and performs an arbitrary operation on each element

of a list.

For example:

@array = ("an","bert","cindy","dirk"); @more = map $_ + 3, (1,2,4,8,16,25); # returns (4,5,7,11,19,28)

@initials = map substr($_,0,1), @array; # returns ("a","b","c","d")

Page 23: 1 Perl Tutorial Practical extraction and report language .

23

Hashes (Associative Arrays)

- associate keys with values – named with %- allows for almost instantaneous lookup of a value

that is associated with some particular key

Examplesif %wordfrequency is the hash table,$wordfrequency{"the"} = 12731; # creates key "the", value 12731$phonenumber{"An De Wilde"} = "+31-20-6777871"; $index{$word} = $nwords; $occurrences{$a}++; # if this is the first reference, # the value associated with $a will # be increased from 0 to 1

Page 24: 1 Perl Tutorial Practical extraction and report language .

24

Hash Operations- %birthdays = ("An","25-02-1975","Bert","12-10-

1953","Cindy","23-05-1969","Dirk","01-04-1961"); - # fill the hash

- %birthdays = (An => "25-02-1975", Bert => "12-10-1953", Cindy => "23-05-1969", Dirk => "01-04-1961" );

- # fill the hash; the same as above, but more explicit

- @list = %birthdays; # make a list of the key/value pairs

- %copy_of_bdays = %birthdays; # copy a hash

Page 25: 1 Perl Tutorial Practical extraction and report language .

25

Hashes (What if not there?)

- Existing, Defined and true.

- If the value for a key does not exist in the hash, the access to it returns the undef value.

- special test function exists(HASHENTRY), which returns true if the hash key exists in the hash

- if($hash{$key}){...}, or if(defined($hash{$key})){...} - return false if the key $key has no associated

value- print "Exists\n" if exists $array{$key};

Page 26: 1 Perl Tutorial Practical extraction and report language .

26

Perl Example 2

#!/ex2/usr/bin/perl# Remove blank lines from a file# Usage: singlespace < oldfile > newfile

while ($line = <STDIN>) { if ($line eq "\n") { next; } print "$line";}

Page 27: 1 Perl Tutorial Practical extraction and report language .

27

More Perl notes On the UNIX command line;

< filename means to get input from this file > filename means to send output to this file

In Perl, <STDIN> is the input file, <STDOUT> is the output file Scalar variables start with $ Scalar variables hold strings or numbers, and they are

interchangeable Examples:

$priority = 9; $priority = '9';

Array variables start with @

Page 28: 1 Perl Tutorial Practical extraction and report language .

28

Perl Example 3#!/usr/local/bin/perl# Usage: fixm <filenames># Replace \r with \n -- replaces input files

foreach $file (@ARGV) { print "Processing $file\n"; if (-e "fixm_temp") { die "*** File fixm_temp already exists!\n"; } if (! -e $file) { die "*** No such file: $file!\n"; } open DOIT, "| tr \'\\015' \'\\012' < $file > fixm_temp" or die "*** Can't: tr '\015' '\012' < $ file > $ fixm_temp \n"; close DOIT; open DOIT, "| mv -f fixm_temp $file"

or die "*** Can't: mv -f fixm_temp $file\n"; close DOIT;}

Page 29: 1 Perl Tutorial Practical extraction and report language .

29

Comments on example 3 In # Usage: fixm <filenames>, the angle brackets just mean to

supply a list of file names here In UNIX text editors, the \r (carriage return) character usually shows up

as ^M (hence the name fixm_temp) The UNIX command tr '\015' '\012' replaces all \015 characters (\r)

with \012 (\n) characters The format of the open and close commands is:

open fileHandle, fileName close fileHandle, fileName

"| tr \'\\015' \'\\012' < $file > fixm_temp" says: Take input from

$file, pipe it to the tr command, put the output on fixm_temp

Page 30: 1 Perl Tutorial Practical extraction and report language .

30

Arithmetic in Perl

$a = 1 + 2; # Add 1 and 2 and store in $a$a = 3 - 4; # Subtract 4 from 3 and store in $a$a = 5 * 6; # Multiply 5 and 6$a = 7 / 8; # Divide 7 by 8 to give 0.875$a = 9 ** 10; # Nine to the power of 10, that is, 910

$a = 5 % 2; # Remainder of 5 divided by 2++$a; # Increment $a and then return it$a++; # Return $a and then increment it--$a; # Decrement $a and then return it$a--; # Return $a and then decrement it

Page 31: 1 Perl Tutorial Practical extraction and report language .

31

String and assignment operators

$a = $b . $c; # Concatenate $b and $c$a = $b x $c; # $b repeated $c times

$a = $b; # Assign $b to $a$a += $b; # Add $b to $a$a -= $b; # Subtract $b from $a$a .= $b; # Append $b onto $a

Page 32: 1 Perl Tutorial Practical extraction and report language .

32

Single and double quotes

$a = 'apples'; $b = 'bananas'; print $a . ' and ' . $b;

prints: apples and bananas print '$a and $b';

prints: $a and $b print "$a and $b";

prints: apples and bananas

Page 33: 1 Perl Tutorial Practical extraction and report language .

33

Arrays @food = ("apples", "bananas", "cherries"); But… print $food[1];

prints "bananas"

@morefood = ("meat", @food); @morefood ==

("meat", "apples", "bananas", "cherries"); ($a, $b, $c) = (5, 10, 20);

Page 34: 1 Perl Tutorial Practical extraction and report language .

34

push and pop push adds one or more things to the end of a list

push (@food, "eggs", "bread"); push returns the new length of the list

pop removes and returns the last element $sandwich = pop(@food);

$len = @food; # $len gets length of @food $#food # returns index of last element

Page 35: 1 Perl Tutorial Practical extraction and report language .

35

foreach

# Visit each item in turn and call it $morsel

foreach $morsel (@food){ print "$morsel\n"; print "Yum yum\n"; }

Page 36: 1 Perl Tutorial Practical extraction and report language .

36

Tests

“Zero” is false. This includes:0, '0', "0", '', ""

Anything not false is true Use == and != for numbers, eq and ne for

strings &&, ||, and ! are and, or, and not, respectively.

Page 37: 1 Perl Tutorial Practical extraction and report language .

37

for loops

for loops are just as in C or Java

for ($i = 0; $i < 10; ++$i){ print "$i\n";}

Page 38: 1 Perl Tutorial Practical extraction and report language .

38

while loops

#!/usr/local/bin/perlprint "Password? ";$a = <STDIN>;chop $a; # Remove the newline at endwhile ($a ne "fred"){ print "sorry. Again? "; $a = <STDIN>; chop $a;}

Page 39: 1 Perl Tutorial Practical extraction and report language .

39

do..while and do..until loops

#!/usr/local/bin/perldo{ print "Password? "; $a = <STDIN>; chop $a;}while ($a ne "fred");

Page 40: 1 Perl Tutorial Practical extraction and report language .

40

if statements

if ($a){ print "The string is not empty\n";}else{ print "The string is empty\n";}

Page 41: 1 Perl Tutorial Practical extraction and report language .

41

if - elsif statements

if (!$a) { print "The string is empty\n"; }elsif (length($a) == 1) { print "The string has one character\n"; }elsif (length($a) == 2) { print "The string has two characters\n"; }else { print "The string has many characters\n"; }

Page 42: 1 Perl Tutorial Practical extraction and report language .

42

Why Perl?

Two factors make Perl important: Pattern matching/string manipulation

Based on regular expressions (REs) REs are similar in power to those in Formal Languages… …but have many convenience features

Ability to execute UNIX commands Less useful outside a UNIX environment

Page 43: 1 Perl Tutorial Practical extraction and report language .

43

Basic pattern matching

$sentence =~ /the/ True if $sentence contains "the"

$sentence = "The dog bites.";if ($sentence =~ /the/) # is false …because Perl is case-sensitive

!~ is "does not contain"

Page 44: 1 Perl Tutorial Practical extraction and report language .

44

RE special characters

. # Any single character except a newline

^ # The beginning of the line or string

$ # The end of the line or string

* # Zero or more of the last character

+ # One or more of the last character

? # Zero or one of the last character

Page 45: 1 Perl Tutorial Practical extraction and report language .

45

RE examples

^.*$ # matches the entire string

hi.*bye # matches from "hi" to "bye" inclusive

x +y # matches x, one or more blanks, and y

^Dear # matches "Dear" only at beginning

bags? # matches "bag" or "bags"

hiss+ # matches "hiss", "hisss", "hissss", etc.

Page 46: 1 Perl Tutorial Practical extraction and report language .

46

Square brackets

[qjk] # Either q or j or k

[^qjk] # Neither q nor j nor k

[a-z] # Anything from a to z inclusive

[^a-z] # No lower case letters

[a-zA-Z] # Any letter

[a-z]+ # Any non-zero sequence of # lower case letters

Page 47: 1 Perl Tutorial Practical extraction and report language .

47

More examples

[aeiou]+ # matches one or more vowels

[^aeiou]+ # matches one or more nonvowels

[0-9]+ # matches an unsigned integer

[0-9A-F] # matches a single hex digit

[a-zA-Z] # matches any letter

[a-zA-Z0-9_]+ # matches identifiers

Page 48: 1 Perl Tutorial Practical extraction and report language .

48

More special characters

\n # A newline\t # A tab\w # Any alphanumeric; same as [a-zA-Z0-9_]\W # Any non-word char; same as [^a-zA-Z0-9_]\d # Any digit. The same as [0-9]\D # Any non-digit. The same as [^0-9]\s # Any whitespace character\S # Any non-whitespace character\b # A word boundary, outside [] only\B # No word boundary

Page 49: 1 Perl Tutorial Practical extraction and report language .

49

Quoting special characters

\| # Vertical bar\[ # An open square bracket\) # A closing parenthesis\* # An asterisk\^ # A carat symbol\/ # A slash\\ # A backslash

Page 50: 1 Perl Tutorial Practical extraction and report language .

50

Alternatives and parentheses

jelly|cream # Either jelly or cream

(eg|le)gs # Either eggs or legs

(da)+ # Either da or dada or # dadada or...

Page 51: 1 Perl Tutorial Practical extraction and report language .

51

The $_ variable

Often we want to process one string repeatedly The $_ variable holds the current string If a subject is omitted, $_ is assumed Hence, the following are equivalent:

if ($sentence =~ /under/) … $_ = $sentence; if (/under/) ...

Page 52: 1 Perl Tutorial Practical extraction and report language .

52

Case-insensitive substitutions

s/london/London/i case-insensitive substitution; will replace london,

LONDON, London, LoNDoN, etc.

You can combine global substitution with case-insensitive substitution s/london/London/gi

Page 53: 1 Perl Tutorial Practical extraction and report language .

53

Remembering patterns

Any part of the pattern enclosed in parentheses is assigned to the special variables $1, $2, $3, …, $9

Numbers are assigned according to the left (opening) parentheses

"The moon is high" =~ /The (.*) is (.*)/ Afterwards, $1 = "moon" and $2 = "high"

Page 54: 1 Perl Tutorial Practical extraction and report language .

54

Dynamic matching

During the match, an early part of the match that is tentatively assigned to $1, $2, etc. can be referred to by \1, \2, etc.

Example: \b.+\b matches a single word /(\b.+\b) \1/ matches repeated words "Now is the the time" =~ /(\b.+\b) \1/ Afterwards, $1 = "the"