Perl Scripting for Biologists
Perl ( www.perl.org )• A preferred programming language in bioinformatics
• Easy to learn and write
• Write simple programs fast, yet very powerful.
• File and text manipulation, database access, graphical and web programming.
• Derives from– C language– Unix shell– sed– awk
www.cpan.org
www.bioperl.org
Where can I get Perl?
• Unix/Linux– Installed as standard, or get the package.
• Apple – Included from OSX 10.3
• Microsoft– www.perl.com Compiling from source– Executable distributions
• Strawberryperl.com• www.activestate.com/activeperl
Editing programs
• Why not use Microsoft Word?– Embedded control characters in file formats– No syntax highlighting / auto indentation– No integration with other development tools
• Some tools:– Emacs– Vi, vim, gvim– Eclipse– Xcode (Apple)
My first program
• Editing the program– Open emacs (or your favorite editor)– First line should be #!/usr/bin/perl– enter program code (print “Hello World!\n”; )– Save (helloworld.pl)
• Execute the program in the terminal
$ perl helloworld.pl
$ perl -c helloworld.pl # compile only
Adding execute permissions
$ chmod +x helloworld.pl
• Check with ls -l• If we added execute permissions:
$ ./helloworld.pl
Adding comments• Any line starting with a # sign is a comment• Everything after a # sign on a line is a comment• For example
# this is a comment
print “Hello World\n”; # another comment
Hello World program
#!/usr/bin/perl# my first Perl program
print “Hello World\n”;
Development Cycle
Edit Compile
Run / Test
Program compilation/interpretation
• At runtime: program compiled, then executed• Syntax errors: program will not run
# this is a syntax error
print Hello World\n”;
Execute the program in the terminal
$ perl helloworld.pl
$ perl -c helloworld.pl # compile only
Program format
• First line is #!/usr/bin/perl • File .pl extension is optional • Statements end with a semicolon ;• Comments: lines beginning with #• Syntax errors: program will not run• Variables : scalars ($) , arrays (@) , hashes (%)
Scalar variables
#!/usr/bin/perl
# my first Perl program
$message = “Hello World!”;
print “$message\n”;
Scalar variables
#!/usr/bin/perl
$a = 2;
$b = 5;
#sum
$result = $a + $b;
# print it
print "Result is: $result\n";
Scalar Variables
• Scalars are prefixed with a $ sign
• Valid variable names begin with a letter, and then any number of letters, numbers and underscores (_).
$foo
$chromosome_number
$block13
$a123b
$test1A2
$test1_a_2● Capital letters are legal, but not often used in Perl variable names.
• Most programmers don't use camelCase for Perl variables (such as $chromosomeNumber)
Variable assignments
• Assignment operator is “=”• Examples
$r = 4; # assigning an integer
$pi = 3.14156; # assigning a real
$foo = “hello”; # assigning a string. # Note the quotes
$bar = 'Ciao!'; # alternate set of # quotes
$sum = $pi * ($r ** 2); # do some math
Variables
• Write numeric variables in any format you want
$a = 134 ;
$a = -2004 ;
$a = 56.79 ;
$a = -56.7913 ;
$a = 7.25e24 ;
$a = -12E-29 ;
Variables
• Write strings with double or single quotes
$a = “” # empty string
$a = “BTI's bioinformatics course”;
$a = 'BTI bioinformatics course';
Variables
• Use double quotes with
$a = “BTI perl course\n” #print new line
$a = “TAB1\tTAB2\tTAB3\n”; #print tabs
$a = “$n students in the $course”; #print variable values
● literal double quote or backslash is escaped with a backslash (\” \\)
$a=“$n students in the \“BTI perl course\””;
Writing safe code
• By default, no variable declaration necessary in Perl• BAD!!!!!!!!!• Turn on optional variable declaration:
– use strict;
– At the beginning of the program (after #!/usr/bin/perl)• Another good directive is
– use warnings;
– Turns on warnings during compile / execution.
Declaring variables• Several ways
– Global variables. • Variables are “global” in scope, valid anywhere in
your program.• Declaration: using keyword our• our $foo;
– Local variables.• Variables that are “lexically scoped”, i.e., valid only in
the current blocks and the enclosed blocks• Declaration using the keyword my• my $foo;
Writing safe code
Writing safe code
• Declare your variables with mymy $a = 1;
#initialize some variables with undef value:my $a;my ($a,$b,$c);
The my operator declares a variable or a list of variables to be local (private) to the enclosed block, subroutine or file (the “scope”).
● Variables from the same program will not “step on each other”
● Important if your code will be used with other programs with variable names unknown to you
Writing safe code - summary
• Declare your variables with my
#!/usr/bin/perluse strict;use warnings; my $a = 1;my $b;my $sum = $a + $b ;
• Use the strict module for enforcing declaring variables• Use the warnings module for helping debugging
Writing safe code - summary
• Declare your variables with my
#!/usr/bin/perluse strict;use warnings; my $a = 1;my $b = “this is a string”;my $sum = $a + $b ;
• Use the strict module for enforcing declaring variables• Use the warnings module for helping debugging
Writing safe code - summary
• Declare your variables with my
#!/usr/bin/perluse strict;use warnings; my $a = 1;my $b = 2;my $c = 3;my $sum = $a + $b + $d;
• Use the strict module for enforcing declaring variables• Use the warnings module for helping debugging
Operations with scalars
• Assigning to other variables
– $a = $b;
• Math for number containing variables
– my $foo = 2 * $bar;
– my $c = sqrt($a ** 2 + $b ** 2));
• Printing
– print “The total is $total\n”;
– Note double quotes (single quotes print literals)
• String operations
– Concatenation: $z = $x . $y . 'blabla';
• Reading from the keyboard
– my $name = <STDIN>; OR my $name = <>;
Perl function callsmy $result = foo($x, $y);
Where– foo() function name– $x, $y function parameters– $result is the return value (can also be @array or
%hash)
• More information about a specific function:
perldoc -f <function> (perldoc -f print)
Math functions• abs, atan2, cos, exp, hex, int, log, oct, rand, sin, sqrt• Math operators
+ - / * ** (power) % (modulus)
Built-in functions
my $a = 10;my $b = 15;my $result = sqrt($a + $b);
Some functions operating on strings• length() – length of a string• uc() and lc() - convert to upper (lower) case• Concatenation: “.”
Built-in functions
my $a = “Hello”;my $b = “ world”;my $string = $a . $b;
print “$string\n”;print “string lc($string) has ” . length($string) . “ characters\n”;
Arrays @
• Ordered list of variables• The variable prefix '@' defines a list• Each element can be accessed using a numeric index• Declaration & notation
my @list; # the empty listmy @list = (1, 2, 3, 4, 5, 6); # a list of six
integer elementsmy @list = (“foo”, “bar”, “batz”); # a list of
string values
Traversing a listUsing the foreach construct
my @countries = (“USA”, “France”, “England”);
foreach my $country (@countries) {
print “$country\n”;
}
Arrays
my @list = ('a', 'b', 'c');
my $value = 'd';
#add an element to the end of the array:
push @list, $value;
#remove the last element of the array:
my $last = pop @list;
#add an element to the beginning of the array:
unshift @list, $value;
#remove an element from the beginning of the array:
shift @list;
Adding/removing list elements
Accessing individual list elements
• List elements are accessed using the index
• The index is zero-based !!!!• When accessing an element, use a $ (scalar)
my @countries = ('France', 'China', 'Peru');
First element: $countries[0] (has the value of 'France')
• Assigning to a list element
$countries[3] = 'Marocco';
Other list operations• my @foo = (1, 2, 3, 4);
• my @bar = (5, 6, 7, 8);
Combine 2 lists:
• my @combined = (@foo, @bar);
Assign list into another list:
• my @x = (@foo, 12, 13, @bar, 45);
Extract elements from a list:
• my ($x, $y, @rest) = @foo;
“List” vs “scalar” context• “List context”
– In list context, the list is treated as a list• “Scalar context”
– The list is treated as a scalar. – As a scalar, the value is the number of list elements.– Sometimes subtle changes can change the context
• Try: print @list;
• Versus: print @list . “\n”;
• The function scalar forces the list in scalar context: my $count = scalar(@list);
More functions on lists• Sort a list
my @sorted = sort ('Zimbabwe', 'Japan', 'Spain');
• Join: convert a list to a string
my $list = join(“, “, @countries);
• Split: convert a string to a list
my @list = split(“ “, “hello world”);
Transforming lists using map
my @numbers = (1, 2, 3, 4, 5);
my @squares = map { $_ ** 2 } @numbers;
print join(“, “, @squares);
Prints
1, 4, 9, 16, 25
Transforming lists using map
my @numbers = (1, 2, 3, 4, 5);
my @squares = map { $_ ** 2 } @numbers;
The same as
foreach my $n (@numbers) {
$n = $n ** 2 ;
}
Arrays: summary
➢ Definition: my @numbers = (1, 7, 3, 9, 5);➢ foreach my $i (@numbers { ... }
➢ print “ @numbers \n”; print join(',' , @numbers);
➢ @sorted = sort(@numbers);
➢ my $length = scalar(@numbers);
➢ Add/extract array elements with shift, unshift, pop, push functions
➢ Access individual elements: $numbers[$index] ($index starts with 0)
➢ More array functions: join, split, map
(perldoc -f <built_in_function_name>
Hashes %
• Collection of key/value pairs
• Order of the key/value pairs in the hash is not important
• Declaration– Use % as prefix
my %capitals; # the empty hash
my %capitals = ('Spain' => 'Madrid',
'Japan' => 'Tokyo',
'Peru' => 'Lima' );
• Access hash element, assign, etc.
$capitals{'Spain'} = “Madrid”;
Traversing hashes
• By key
foreach my $k (keys(%capitals)) {
print “$k: $capitals{$k}\n”;
}
• By value
foreach my $v (values(%capitals)) {
print “$v\n”;
}
• Hashes have no defined order of elements
Perl variables -reminder
Scalar: $var
Array: @var; list of indexed scalars, order matters.
$var[0]; $var[$index];
foreach my $element (@var) {
print $element . “\n”;
}
Hash: %var; elements ('values') and indexes ('keys') are scalars. Indexes are not ordered, but must be unique!
$var{'name'} ; $var{$key};
foreach my $key ( keys (%var) ) {
print “key = $key , value = $var{$key} \n”;
}