Top Banner
Perl
35

Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic.

Jan 12, 2016

Download

Documents

Aubrie Merritt
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic.

Perl

Page 2: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic.

Perl notes 2

Perl

• Perl - Practical extraction report language– for text files– system management– combines C, SED, AWK, SH– interpreted– dynamic

Page 3: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic.

Perl notes 3

Data Structures

• scalars $num• arrays @num• associative arrays %num

• $num[50]– 50th element of the array num

• $#num– last index of num

Page 4: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic.

Perl notes 4

Examples#! /usr/local/bin/perl -w

# find the sum of a list of numbers from STDIN

# one number per line

$sum = 0;

while( <STDIN> ) {

$sum += int $_;

}

print "the sum is $sum\n";

Page 5: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic.

Perl notes 5

Examples

#!/usr/bin/perl -w

# find the sum of a list of numbers from STDIN

# several numbers per line

$sum = 0;

while( <STDIN> ) {

@nums = split;

foreach (@nums) {

$sum += int $_;

}

}

print "the sum is $sum\n";

Page 6: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic.

Perl notes 6

Average

#!/usr/bin/perl -w

# find the average of a list of

# numbers from STDIN

# several numbers per line

$sum = 0;

$count = 0;

while( <STDIN> ) {

@nums = split;

foreach (@nums) {

$sum += int $_;

$count++;

}

}

print "the average is ", $sum/$count, "\n";

Page 7: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic.

Perl notes 7

median#!/usr/bin/perl -w

# find the median of a list of number

# from STDIN

# several numbers per line

@nums = ();

while( <STDIN> ) {

@nums = (@nums, split );

}

@nums = sort @nums;

if($#nums % 2) {

$median = ($nums[($#nums - 1)/2] + $nums[($#nums + 1)/2])/2;

}

else {

$median = $nums[$#nums/2];

}

print "the median is $median\n";

Page 8: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic.

Perl notes 8

Output?

#!/usr/bin/perl -w

@stuff = ("one", "two", "three");

print @stuff, "\n";

$stuff = ("one", "two", "three");

print $stuff, "\n";

$stuff = @stuff;

print $stuff, "\n";

onetwothree8

three

3

Page 9: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic.

Perl notes 9

Pattern Matching

m//

s///

Modifiers• i case-insensitive• m multiple lines• s single line• x extend

Page 10: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic.

Perl notes 10

Regular Expressions

Code Meaning

\w Alphanumeric Characters

\W Non-Alphanumeric Characters

\s White Space

\S Non-White Space

\d Digits

\D Non-Digits

\b Word Boundary

\B Non-Word Boundary

\A ^ At the Beginning of a String

\Z $ At the End of a String

. Match Any Single Character

Page 11: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic.

Perl notes 11

Regular Expressions

* Zero or More Occurrences

? Zero or One Occurrence

+ One or More Occurrences

{ N } Exactly N Occurrences

{ N,M } Between N and M Occurrences

.* <thingy> Greedy Match, up to the last thingy

.*? <thingy> Non-Greedy Match, up to the first thingy

[ set_of_things ] Match Any Item in the Set

[ ^ set_of_things ] Does Not Match Anything in the Set

( some_expression ) Tag an Expression

$1..$N Tagged Expressions used

in Substitutions

Page 12: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic.

Perl notes 12

Rules

• Rule 1– The engine tries to match as far left

as it can

• Rule 2– The regular expression is regarded

as set of alternatives. Tries them left to right. (see page 61)

• Rule 3– Items that have choices match from

left to right

/x*y*/

• Rule 4– Assertions– ^ $ \b \B \A \Z \G (?…) (?!…)

Page 13: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic.

Perl notes 13

Rules

• Rule 5– A quantified atom matches only if

the atom itself matches some number of times allowed by the quantifier

Maximal minimal

{n,m} {n,m}?

{n,} {n,}? At least n

{n} {n}? Exactly n

* *? 0 or more

+ +? 1 or more

? ?? 0 or 1

Page 14: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic.

Perl notes 14

Rules

• Rule 6– Each atom matches according to its

type– (…) ==> grouping + storage $1, $2– . matches any char except \n– […] groups– Special characters \a \n \r …– \1 \2 ... backreference to (…)– \033 octal char– \xf7 hex char– \cD control char– any other \ matches the char itself

Page 15: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic.

Perl notes 15

precedence

• () (?: )• Repetition• Sequence• | alteration

Pattern strings/ab*c/ abc, ac, ababd, abbbc/abc*/ a, ab, abc, abccc, abcabc/(abc)*/ abc, abcc. empty string, abcabc/ed|jo/ ed, jo, edo, ejo/(ed)|(jo)/ ed, jo, edo, ejo/ed|jo{1,3}/ ed, jo, edo, ejo, joo, jooooo/ed|jo{1,3}?/ ed, jo, edo, ejo, joo, jooooo/^ed|jo$/ fred and joe, ed jo, fred jo, jo/^(ed|jo)$/ fred and joe, ed jo, fred jo, jo$pat = ‘bob’;/$pat{3}/

pat, bob, bobbobbob, bobbb, patt

$pat = ‘bob’;/($pat){3}/

pat, bob, bobbobbob, bobbb, patt

Page 16: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic.

Perl notes 16

• How do you fix it?

/(‘[^’]’*’)/

Pattern strings/\w+/ Greetings, planet earth!/\w*/ Greetings, planet earth!/n[et]*/ Greetings, planet earth!/n[et]+/ Greetings, planet earth!/G.*t/ Greetings, planet earth!/(‘.*’)/ this ‘test’ isn’t good

Page 17: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic.

Perl notes 17

Examples

s/^([^ ]) +([^ ]+)/$2 $1/

/(\w+)\s*=\s*\1/

/.{40,}/

/^((\d+\.?\d*|\.\d+)$/

if (/Time: (..):(..):(..)/){

$hours = $1;

$minutes = $2;

$seconds = $3;

}

Page 18: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic.

Perl notes 18

Default arguments

• $_, @_, @ARGV, STDIN

sub foo{

my $x = shift; # @_ default

• in the main program @ARGVwhile($_ = shift) {

if(/^-(.*)/){

process_optein($1);

} else {

process_file($_);

}

}

Page 19: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic.

Perl notes 19

Reading a stream

open FIN, “myfile” or die;

while (<FIN>){

# do something with $_

}

foreach (<FIN>){

# do something with $_

}

print sort <FIN>;

Page 20: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic.

Perl notes 20

Reading a stream

# print a window@f = <FIN>;

foreach ( 0..$#f ) {

if[$[$_] =~ /\bShazam\b/){

$lo = ($_ > 0)? $_ -1 : $_;

$hi = ($_ < $#f) )? $_ +1 : $_;

print map{“$_: $f[$_]”} $lo .. $hi;

}

}

Page 21: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic.

Perl notes 21

Sorting

• sort numerically

sub numerically { $a <=> $b }

@list = sort numerically

(16, 1, 8, 2, 4, 32);

or

@list = sort { $a <=> $b }

(16, 1, 8, 2, 4, 32);

@list = sort{uc($a) cmp uc($b)}

qw(this is a test);

#reverse

@list = sort { $b <=> $a }

(16, 1, 8, 2, 4, 32);

Page 22: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic.

Perl notes 22

example#! /usr/bin/perl -w

# This script will count the frequency of distinct words

# in the file that is given as an argument.

# Warning: Error checking is minimal!

die "usage: $0 file\n" unless @ARGV;

while(<>){

tr/A-Z/a-z/; # translate to lowercase

@w = split(/[\W]+/,$_); # split into words

foreach (@w){

$list{$_}++; # increment the counter

}

}

foreach $key (sort {$list{$b} <=> $list{$a}} keys %list) {

print $key, ' = ', $list{$key}, "\n";

}

Page 23: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic.

Perl notes 23

Tokenizing

# tokenize an arithmetic expression

while($_){

if(/^(\d+)/) {

push @tok, ‘num’, $1;

} elsif(/^([+\-\/*()])/) {

push @tok, ‘punct’, $1;

} elsif (/^([\d\D])/) {

die “invalid char $1 in input”;

}

$_ = substr($_, length $1);

}

• substr slows things down– cut start of string

Page 24: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic.

Perl notes 24

Tokenizing 2

while(/

(\d+) |

([+\-\/*()]) |

([\d\D])/gx) {

if($1 ne “”){

push @tok, ‘num’, $1;

}elsif ($2 ne “”) {

push @tok, ‘punct’, $2;

}else {

die “invalid char $3 in input”;

}

}

Page 25: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic.

Perl notes 25

Tokenizing 3

{

if(/\G(\d+)/gc) {

push @tok, ‘num’, $1;

} elsif(/\G([+\-\/*()])/gc) {

push @tok, ‘punct’, $1;

} elsif (/\G([\d\D])/gc) {

die “invalid char $1 in input”;

}else{

last;

}

redo;

}

Page 26: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic.

Perl notes 26

Use split for clarity

($a, $b, $c) =

/^(\S+)\s+(\S+)\s+(\S+)/;

($a, $b, $c) = split /\s+/, $_;

($a, $b, $c) = split;

Get the fifth field:

($a) =

/[^:]*:[^:]*:[^:]*:[^:]*:([^:]*)/;

or

($a) = /(?:[^:]*:){4}([^:]*)/;

or

($a) = (split /:/)[4];

Page 27: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic.

Perl notes 27

unpacps l

F UID PID PPID PRI NI VSZ RSS WCHAN STAT TTY TIME COMMAND

100 1216 30562 30561 7 0 2804 1768 rt_sig S pts/2 0:00 -tcsh

000 1216 30658 30562 10 0 2780 1080 - R pts/2 0:00 ps l

chomp (@ps = `ps l`);

shift @ps;

for(@ps){

($uid, $pid, $sz, $tt) =

unpack '@3 A6 @9 A7 @30 A5 @52 A7', $_;

print "$uid, $pid, $sz, $tt\n";

}

Page 28: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic.

Perl notes 28

Avoid regex for simple strings

do_it() if $answer eq ‘yes’;

do_it() if $answer =~ /^yes$/;

do_it() if $answer =~ /yes/;

do_it() if lc($answer) eq ‘yes’;

do_it() if $answer =~ /^yes$/i;

Page 29: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic.

Perl notes 29

#!/usr/bin/perl

# remove the comments from a C program

$filename = shift or die "usage $0 filename\n";

open FIN, $filename or die "can't open file";

while (<FIN>){

for(split m!("(:?\\\W|.)*?"|/\*|\*/)!){

if($in_comment){

$in_comment = 0 if $_ eq "*/";

} else {

if ($_ eq "/*") {

$in_comment = 1;

print " ";

} else {

print;

}

}

}

print "\n";

}

Page 30: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic.

Perl notes 30

References

$a = 3.1416;

$scalar_ref = \$a;

$array_ref = \@a;

$hash_ref = \%a;

$array_el_ref = \$a[3];

$hash_el_ref = \$a{‘John’};

Page 31: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic.

Perl notes 31

Lists of Lists

@LoL = (

[“fred”, “barney” ],

[“george”, “jane”, “elroy” ],

[“homer”, “marge”, “bart” ],

);

print $LoL[2][2]; # prints “bart”

$ref_to_LoL = [

[“fred”, “barney” ],

[“george”, “jane”, “elroy” ],

[“homer”, “marge”, “bart” ],

];

print $ref_to_LoL ->[2][2];

• Note:$LoL[2][2] implies $LoL[2]->[2]

Page 32: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic.

Perl notes 32

Grow your own

while(<>){

@tmp = split;

push @LoL, [ @tmp ];

}

Page 33: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic.

Perl notes 33

Hashes of Arrays%HoL = (

flinstones => [“fred”, “barney” ],

jetsons => [“george”, “jane”, “elroy” ],

simpsons => [“homer”, “marge”, “bart” ],

);

• generation# reading from a file with format:

# flistones: fred barney ..

while(<>){

next unless s/^(.*?):\s*//;

$HoL{$1} = [ split ];

}

• orwhile($line = <>){

($who, $rest) = split /:\s*/, 2;

@fields = split ‘ ‘, $rest;

$Hol{$who} = [ @fields ];

}

Page 34: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic.

Perl notes 34

Hashes of Arrays# calling a function

for $group (flinstones, jetsons, simpsons) { %HoL($group) = [ get_family($group) ];

);

# append member to existing family

push @{ $HoL{flinstones} }, “wilma”, “betty”;

• access$HoL{flinstone}[0] = “fred”;

Page 35: Perl. Perl notes 2 Perl Perl - Practical extraction report language –for text files –system management –combines C, SED, AWK, SH –interpreted –dynamic.

Perl notes 35

Packages, Modules, and Object Classes