Top Banner
1 96-Summer 生生生生生生生生生生 ( 生 ) Bioinformatics with Perl 8/13~8/22 生生生 8/24~8/29 生生生 8/31 生生生
77

1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

Jan 20, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

1

96-Summer生物資訊程式設計實習( 二 )

Bioinformatics with Perl

8/13~8/22 蘇中才8/24~8/29 張天豪

8/31 曾宇鳯

Page 2: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

2

ScheduleDate Time Subject Spea

ker

8/13 一

13:30~17:30 Perl Basics 蘇中才

8/15 三

13:30~17:30 Programming Basics 蘇中才

8/17 五

13:30~17:30 Regular expression 蘇中才

8/20 一

13:30~17:30 Retrieving Data from Protein Sequence Database

蘇中才

8/22 三

13:30~17:30 Perl combines with Genbank, BLAST 蘇中才

8/24 五

13:30~17:30 PDB database and structure files 張天豪

8/27 一

8:30~12:30 Extracting ATOM information 張天豪

8/27 一

13:30~17:30 Mapping of Protein Sequence IDs and Structure IDs

張天豪

8/31 五 13:30~17:30 Final and Examination 曾宇鳳

Page 3: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

3

Reference Books Learning Perl

(Perl 學習手冊 )

Beginning Perl for Bioinformatics

Bioinformatics Biocomputing and Perl: An Introduction to Bioinformatics Computing Skills and Practice

Page 4: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

4

Page 5: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

5

Learning Perl

Page 6: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

6

Perl

Practical Extraction and Report Language Created by Larry Wall in the middle

1980`s. Suitable for “quick-and-dirty” Suitable for string-handling Powerful regular expression

Page 7: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

7

Preparation

Downloading putty.exe / pietty.exe Getting materials for this course:

http://gene.csie.ntu.edu.tw/~sbb/summer-course/

Server: ssh 140.112.28.186 Id : course1 ~ course20 Password:

Page 8: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

8

Installing Perl on Windows

Download package from http://www.activestate.com/ http://downloads.activestate.com/ActivePerl/Windows/

5.8/ActivePerl-5.8.8.820-MSWin32-x86-274739.msi

Versions of Perl Unix, Linux, Windows (ActivePerl), Mac (MacPerl) http://www.perl.com/

Page 9: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

9

Text Editors

A convenient (text) editor for programming Ultraedit: good for me Notepad: just an editor Vim: UNIX/Linux lover

http://lpi.indicator-online.net/vim.html

http://homepage.ttu.edu.tw/u9106240/page_main/vim_menu.html

Joe : easy to use for Unix beginner

Page 10: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

10

Finding Help

Best resource finding tool – On-line Resources, use

http://www.perl.com/http://www.perl.org/http://www.cpan.org/

HTML Help in ActivePerl Command Line (highly recommended)

perldoc –f <function> # search function perldoc –q <faqkeywork> # search FAQ perldoc <module> # search module perldoc perldoc

Page 11: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

11

Perl Basic

Starting

Page 12: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

12

$ vi welcome

#! /usr/bin/perl -w

print “Hello, world\n”;

$ chmod +x welcome

$ ./welcome

Hello, world

$ perl welcome

Hello, world

Program: run thyself!

[sbb@gene perl]$ ls -al-rw-rw-r-- 1 sbb sbb 20 Jul 2 15:27 welcome[sbb@gene perl]$ chmod +x welcome[sbb@gene perl]$ ls -al-rwxrwxr-x 1 sbb sbb 20 Jul 2 15:27 welcome

Page 13: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

13

#! /usr/bin/perl -w

# The 'forever' program - a (Perl) program,

# which does not stop until someone presses Ctrl-C.

use constant TRUE => 1;

use constant FALSE => 0;

while ( TRUE )

{

print "Welcome to the Wonderful World of Bioinformatics!\n";

sleep 1;

}

Using the Perl while construct

Page 14: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

14

$ chmod +x forever

$ ./forever

Welcome to the Wonderful World of Bioinformatics!

Welcome to the Wonderful World of Bioinformatics!

Welcome to the Wonderful World of Bioinformatics!

Welcome to the Wonderful World of Bioinformatics!

Welcome to the Wonderful World of Bioinformatics!

Welcome to the Wonderful World of Bioinformatics!

Welcome to the Wonderful World of Bioinformatics!

Welcome to the Wonderful World of Bioinformatics!

Welcome to the Wonderful World of Bioinformatics!

.

.

Running forever ...

Page 15: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

15

Perl Basic

Variables

Page 16: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

16

Variables

Scalar ($)Number

1; 1.23; 12e34

String “abc”; ‘ABC’ ; “Hello, world!”;

Array / List (@)

Hash (%)

Page 17: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

17

Introducing variable containers

The simplest type of variable container is the scalar ( 純量 ).

In Perl, scalars can hold, for example, a number, a word, a sentence or a disk-file.

$name$_address$programming_101$z$abc$swissprot_to_interpro_mapping$SwissProt2InterProMapping

Variable naming is ART !

Page 18: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

18

scalar#!/usr/bin/perl -w

# lower case for user defined ; upper case for system defaultmy $ARGV = “example.pl";my $number = 1.2;my $string = "Hello, world!";my $123 = 123; #errormy $abc = "123";my $_123 = '123';my $O000OoO00 = 1;my $OO00Oo000 = 2;my $OO00OoOOO = 3;

$abc = $O000OoO00 * $OO00Oo000 - $OO00OoOOO;

print $abc x 4 . "\n";print 5 x 4 . "\n";print 5 * 4 . "\n";

Page 19: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

19

Number

Format (range: 1e-100 ~ 1e100 ?)20001.25-6.5e45 (-6.5*10^45)123456789123_456_789

Other format0377 #octal (decimal 255)0xFF #hexadecimal0b11111111 #binary

Page 20: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

20

number$integer = 12;

$real = 12.34;

$oct = 0377;

$bin = 0b11111111;

$hex = 0xff;

$long = 123456789;

$long_ = 123_456_789;

$large = 1E100; #1E200

$small = 1E-100; #1E-200

print "integer : $integer\n";

print "real : $real\n";

print "oct=$oct bin=$bin hex=$hex\n";

#printf("oct=0%o bin=0b%b hex=0x%x\n",$oct,$bin,$hex);

Page 21: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

21

parameters of printf (ref : number)

specifier Output Example

c Character ad or i Signed decimal integer 392

e Scientific notation (mantise/exponent) using e character 3.9265e+2

E Scientific notation (mantise/exponent) using E character 3.9265E+2

f Decimal floating point 392.65g Use the shorter of %e or %f 392.65G Use the shorter of %E or %f 392.65o Signed octal 610s String of characters sampleu Unsigned decimal integer 7235x Unsigned hexadecimal integer 7faX Unsigned hexadecimal integer (capital letters) 7FA

p Pointer addressB800:000

0

n Nothing printed. The argument must be a pointer to a signed int, where the number of characters written so far is stored.

% A % followed by another % character will write % to stdout.

Page 22: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

22

operator

2 + 3 #5 5.1 – 2.4 #2.7 3 * 12 #36 14 / 2 #7 10.2 / 0.3 #34 10 / 3 #3.333… 10 % 3 #1

Page 23: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

23

Operator

Operator Function

+ Addition

-Subtraction, Negative

Numbers, Unary Negation

* Multiplication

/ Division

% Modulus

** Exponent

Operator Function

= Normal Assignment

+= Add and Assign

-= Subtract and Assign

*= Multiply and Assign

/= Divide and Assign

%= Modulus and Assign

**= Exponent and Assign

$number = $number + 100; $number += 100;

Page 24: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

24

Take a break …

modulus10.5 % 3.2 = ?

exponentiation2^3 = ?

Page 25: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

25

string

Format Single quotes

‘hello’ ‘hello\nhello’ ‘hello,$name’

Double quotes “hello” “hello\nhello” “hello,$name”

Exceptions ‘\’\\’ “\”\\”

#!/usr/bin/perl –w

print ‘hello’;

print “hello”;

Page 26: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

26

Backslash escapes

Escape Sequences

Description or CharacterEscape

SequencesDescription or Character

\b Backspace \@ Ampersand

\e Escape \0nnn Any Octal byte

\f Form Feed \xnn Any Hexadecimal byte

\n New line \cn Any Control character

\r Carriage Return \l Change the next character to lowercase

\t Tab \u Change the next character to uppercase

\v Vertical Tab \\ Backslash

\$ Dollar Sign

Page 27: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

27

conversion between String and number $answer = “Hello ” . “ “ . “ world\n”; $answer = “12” . “3”; $answer = “12” * “3”; $answer = “12Hello34” * “3”; #warning !!! $answer = “A” . 3*5; $answer = “A” x (3*5);

$answer = “12”x”3”;

Page 28: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

28

#! /usr/bin/perl -w

# The 'tentimes' program - a (Perl) program,

# which stops after ten iterations.

use constant HOWMANY => 10;

$count = 0;

while ( $count < HOWMANY )

{

print "Welcome to the Wonderful World of Bioinformatics!\n";

$count++;

}

Variable containers and loops

Page 29: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

29

$ chmod +x tentimes

$ ./tentimes

Welcome to the Wonderful World of Bioinformatics!

Welcome to the Wonderful World of Bioinformatics!

Welcome to the Wonderful World of Bioinformatics!

Welcome to the Wonderful World of Bioinformatics!

Welcome to the Wonderful World of Bioinformatics!

Welcome to the Wonderful World of Bioinformatics!

Welcome to the Wonderful World of Bioinformatics!

Welcome to the Wonderful World of Bioinformatics!

Welcome to the Wonderful World of Bioinformatics!

Welcome to the Wonderful World of Bioinformatics!

Running tentimes ...

Page 30: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

30

#! /usr/bin/perl -w

# The 'fivetimes' program - a (Perl) program,# which stops after five iterations.

use constant TRUE => 1;use constant FALSE => 0;

use constant HOWMANY => 5;

$count = 0;while ( TRUE ){ $count++; print "Welcome to the Wonderful World of Bioinformatics!\n"; if ( $count == HOWMANY ) { last; }}

Using the Perl if construct

Page 31: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

31

#! /usr/bin/perl -w# The 'oddeven' program.

use constant HOWMANY => 4;

$count = 0;

while ( $count < HOWMANY ){ $count++; if ( $count % 2 == 0 ) { print “$count : even\n"; } else # $count % 2 is not zero. { print “$count : odd\n"; }}

The oddeven program

Page 32: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

32

Comparison operator

Comparison Number String

Equal == eq

Not equal != ne

Less than < lt

Greater than > gt

Less than or equal <= le

Greater than or equal >= ge

Comparison <=> cmp

Page 33: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

33

Variable Interpolation#! /usr/bin/perl -w

# The ‘interpolation' program which interpolate variables by variable.

$language = “Perl”;

$string = “I love $language”; print $string.”\n”;$string = ‘I love $language”; print $string.”\n”;$string = ‘I love ‘.$language; print $string.”\n”;

$string = “I love \$language”; print $string.”\n”;

$string = “I love $languages”; print $string.”\n”; #${language}s

Page 34: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

34

@list_of_sequences

@totals

@protein_structures

( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA' )

@list_of_sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA' );

Arrays: Associating Data With Numbers

Page 35: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

35

The @list_of_sequences Array

Page 36: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

36

print "$list_of_sequences[1]\n";

GCTCAGTTCT

$list_of_sequences[1] = 'CTATGCGGTA';

$list_of_sequences[3] = 'GGTCCATGAA';

Working with array elements

Page 37: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

37

The Grown @list_of_sequences Array

Page 38: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

38

print "The array size is: ", $#list_of_sequences+1, ".\n";

print "The array size is: ", scalar @list_of_sequences, ".\n";

The array size is: 4.

How big is the array?

Page 39: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

39

@sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA' );

@sequences = ( @sequences, 'CTATGCGGTA' );

print "@sequences\n";

TTATTATGTT GCTCAGTTCT GACCTCTTAA CTATGCGGTA

@sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA' );

@sequences = ( 'CTATGCGGTA' );

print "@sequences\n";

CTATGCGGTA

Adding elements to an array

Page 40: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

40

@sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA' );

@sequences = ( @sequences, ( 'CTATGCGGTA', 'CTATTATGTC' ) );

print "@sequences\n";

TTATTATGTT GCTCAGTTCT GACCTCTTAA CTATGCGGTA CTATTATGTC

@sequence_1 = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA' );

@sequence_2 = ( 'GCTCAGTTCT', 'GACCTCTTAA' );

@combined_sequences = ( @sequence_1, @sequence_2 );

print "@combined_sequences\n";

TTATTATGTT GCTCAGTTCT GACCTCTTAA GCTCAGTTCT GACCTCTTAA

Adding more elements to an array

Page 41: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

41

@sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA',

'TTATTATGTT' );

@removed_elements = splice @sequences, 1, 2;

print "@removed_elements\n";

print "@sequences\n";

GCTCAGTTCT GACCTCTTAA

TTATTATGTT TTATTATGTT

#clean all elements of an array

@sequences = ();

Removing elements from an array

Page 42: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

42

#! /usr/bin/perl -w

# The 'slices' program - slicing arrays.

@sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA',

'CTATGCGGTA', 'ATCTGACCTC' );

print "@sequences\n\n";

@seq_slice = @sequences[ 1 .. 3 ];

print "@seq_slice\n";

print "@sequences\n\n";

@removed = splice @sequences, 1, 3;

print "@sequences\n";

print "@removed\n";

The slices program

Page 43: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

43

TTATTATGTT GCTCAGTTCT GACCTCTTAA CTATGCGGTA ATCTGACCTC

GCTCAGTTCT GACCTCTTAA CTATGCGGTA

TTATTATGTT GCTCAGTTCT GACCTCTTAA CTATGCGGTA ATCTGACCTC

TTATTATGTT ATCTGACCTC

GCTCAGTTCT GACCTCTTAA CTATGCGGTA

Results from slices ...

Page 44: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

44

#! /usr/bin/perl -w

# The 'iterateW' program - iterate over an entire array # with 'while'.

@sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA', 'CTATGCGGTA', 'ATCTGACCTC' );

$index = 0;$last_index = $#sequences;

while ( $index <= $last_index ){ print "$sequences[ $index ]\n"; ++$index;}

Processing every element in an array

Page 45: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

45

TTATTATGTT

GCTCAGTTCT

GACCTCTTAA

CTATGCGGTA

ATCTGACCTC

Results from iterateW ...

Page 46: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

46

#! /usr/bin/perl -w

# The 'iterateF' program - iterate over an entire array

# with 'foreach'.

@sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA',

'CTATGCGGTA', 'ATCTGACCTC' );

foreach $value ( @sequences )

{

print "$value\n";

}

The iterateF program

Page 47: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

47

@sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA',

'CTATGCGGTA', 'ATCTGACCTC' );

@sequences = ( TTATTATGTT, GCTCAGTTCT, GACCTCTTAA,

CTATGCGGTA, ATCTGACCTC );

@sequences = qw( TTATTATGTT GCTCAGTTCT GACCTCTTAA

CTATGCGGTA ATCTGACCTC );

Making lists easier to work with

Page 48: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

48

Quoted words#!/usr/bin/perl -w

# The ‘quoted_words’ program

@list_of_sequences = ( 'TTATTATGTT', 'GCTCAGTTCT', 'GACCTCTTAA' );

@list_of_sequences = qw/TTATTATGTT GCTCAGTTCT GACCTCTTAA/;

@list_of_sequences = qw{TTATTATGTT GCTCAGTTCT GACCTCTTAA};

@list_of_sequences = qw!TTATTATGTT GCTCAGTTCT GACCTCTTAA!;

@list_of_sequences = qw[TTATTATGTT GCTCAGTTCT GACCTCTTAA];

@list_of_sequences = qw<TTATTATGTT GCTCAGTTCT GACCTCTTAA>;

@list_of_sequences = qw#TTATTATGTT GCTCAGTTCT GACCTCTTAA#;

print "@list_of_sequences\n";

print "The array size is: ", $#list_of_sequences+1, ".\n";

Page 49: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

49

pop/push/shift/unshift#!/usr/bin/perl -w#The “array_operator” program

@array = 5..9;print "array = [@array]\n";

$item = pop @array;print "item = [$item]\n";print "array = [@array]\n";

push @array, 9;print "array = [@array]\n";

$item = shift @array;print "item = [$item]\n";print "array = [@array]\n";

unshift @array, 1..5;print "array = [@array]\n";

Page 50: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

50

pop/push/shift/unshift

array = [5 6 7 8 9]

==========pop==========

item = [9]

array = [5 6 7 8]

==========push 9==========

array = [5 6 7 8 9]

==========shift==========

item = [5]

array = [6 7 8 9]

==========unshift 1..5==========

array = [1 2 3 4 5 6 7 8 9]

Page 51: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

51

reverse / sort#!/usr/bin/perl -w

#The “array_operator1” program

@array = qw /5 4 9 8 1 3 6 2 7 10/;

print "array = [@array]\n";

@array_reverse = reverse @array;

print "reverse array = [@array_reverse]\n";

@array_sorted = sort @array;

print "sort array = [@array_sorted]\n";

@array_reversesorted = reverse sort @array;

print "reverse sort array = [@array_reversesorted]\n";

@array_sortedreverse = sort reverse @array;

print "sort reverse array = [@array_sortedreverse]\n";

Page 52: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

52

reverse / sort

array = [5 4 9 8 1 3 6 2 7 10]

========================================

reverse array = [10 7 2 6 3 1 8 9 4 5]

========================================

sort array = [1 10 2 3 4 5 6 7 8 9]

========================================

reverse sort array = [9 8 7 6 5 4 3 2 10 1]

========================================

sort reverse array = [1 10 2 3 4 5 6 7 8 9]

Page 53: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

53

split/join#!/usr/bin/perl -w

#The “array_operator2” program - join / split

$string = "5 4 9 8 1 3 6 2 7 10";

@array = split/ /, $string;

print "array = [@array]\n";

$string = join ",", @array;

print "array = [$string]\n";

array = [5 4 9 8 1 3 6 2 7 10]array = [5,4,9,8,1,3,6,2,7,10]

Page 54: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

54

How to map between IP and domain name ?

IP Domain name

140.112.28.186 gene.csie.ntu.edu.tw

140.112.28.191 biominer.csie.ntu.edu.tw

140.112.28.190 knn.csie.ntu.edu.tw

Page 55: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

55

Use 2 array to map between IP and domain name ?

@IP

140.112.28.186

140.112.28.191

140.112.28.190

@Domain_name

gene.csie.ntu.edu.tw

biominer.csie.ntu.edu.tw

knn.csie.ntu.edu.tw

[0]

[1]

[2]

[0]

[1]

[2]

Page 56: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

56

How to search a certain ip or domain name ?

@IP

140.112.28.186

140.112.28.191

140.112.28.190

@Domain_name

gene.csie.ntu.edu.tw

biominer.csie.ntu.edu.tw

knn.csie.ntu.edu.tw

[0]

[1]

[2]

[0]

[1]

[2]

Page 57: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

57

Why Hash ?

%Domain_name

gene.csie.ntu.edu.tw

biominer.csie.ntu.edu.tw

knn.csie.ntu.edu.tw

[140.112.28.186]

[140.112.28.191]

[140.112.28.190]

Key Value

Page 58: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

58

How to get a certain domain name?

%Domain_name

gene.csie.ntu.edu.tw

biominer.csie.ntu.edu.tw

knn.csie.ntu.edu.tw

[140.112.28.186]

[140.112.28.191]

[140.112.28.190]

Key Value

$Domain_name{“140.112.28.186”}

Page 59: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

59

Examples of Hash

Page 60: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

60

Hashes: Associating Data With Words

%nucleotide_bases

%nucleotide_bases = ( A, Adenine, T, Thymine );

%nucleotide_based = ( A => Adenine, T => Thymine);

key value

Page 61: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

61

print "The expanded name for 'A' is $nucleotide_bases{ 'A' }\n";

The expanded name for 'A' is Adenine

Working with hash entries

Page 62: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

62

%nucleotide_bases = ( A, Adenine, T, Thymine );

@hash_names = keys %nucleotide_bases;

print "The names in the %nucleotide_bases hash are: @hash_names\n";

The names in the %nucleotide_bases hash are: A T

%nucleotide_bases = ( A, Adenine, T, Thymine );

$hash_size = keys %nucleotide_bases;

print "The size of the %nucleotide_bases hash is: $hash_size\n";

The size of the %nucleotide_bases hash is: 2

How big is the hash?

Page 63: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

63

$nucleotide_bases{ 'G' } = 'Guanine';

$nucleotide_bases{ 'C' } = 'Cytosine';

%nucleotide_bases = ( A => Adenine, T => Thymine,

G => Guanine, C => Cytosine );

Adding entries to a hash

Page 64: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

64

The Grown %nucleotide_bases Hash

Page 65: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

65

delete $nucleotide_bases{ ‘C' };

$nucleotide_bases{ 'C' } = undef;

Removing entries from a hash

Page 66: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

66

#! /usr/bin/perl -w

# The ‘slicing_hashes' program – extract a certain subset among a hash

%gene_counts = ( Human => 31000,

'Thale cress' => 26000,

'Nematode worm' => 18000,

'Fruit fly' => 13000,

Yeast => 6000,

'Tuberculosis microbe' => 4000 );

@counts = @gene_counts{ Human, “Fruit fly”, 'Tuberculosis microbe' };

print "@counts\n";

Slicing hashes

31000 13000 4000

Page 67: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

67

#! /usr/bin/perl -w

# The 'bases' program - a hash of the nucleotide bases.

%nucleotide_bases = ( A => Adenine, T => Thymine,

G => Guanine, C => Cytosine );

$sequence = 'CTATGCGGTA';

print "\nThe sequence is $sequence, which expands to:\n\n";

while ( $sequence =~ /(.)/g )

{

print "\t$nucleotide_bases{ $1 }\n";

}

Working with hash entries: a complete example

Page 68: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

68

The sequence is CTATGCGGTA, which expands to:

Cytosine

Thymine

Adenine

Thymine

Guanine

Cytosine

Guanine

Guanine

Thymine

Adenine

Results from bases ...

Page 69: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

69

#! /usr/bin/perl -w

# The 'genes' program - a hash of gene counts.

use constant LINE_LENGTH => 60;

%gene_counts = ( Human => 31000,

'Thale cress' => 26000,

'Nematode worm' => 18000,

'Fruit fly' => 13000,

Yeast => 6000,

'Tuberculosis microbe' => 4000 );

Processing every entry in a hash

Page 70: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

70

print '-' x LINE_LENGTH, "\n";

while ( ( $genome, $count ) = each %gene_counts )

{

print "`$genome' has a gene count of $count\n";

}

print '-' x LINE_LENGTH, "\n";

foreach $genome ( sort keys %gene_counts )

{

print "`$genome' has a gene count of $gene_counts{ $genome }\n";

}

print '-' x LINE_LENGTH, "\n";

The genes program, cont.

Page 71: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

71

------------------------------------------------------------

'Human' has a gene count of 31000

'Tuberculosis microbe' has a gene count of 4000

'Fruit fly' has a gene count of 13000

'Nematode worm' has a gene count of 18000

'Yeast' has a gene count of 6000

'Thale cress' has a gene count of 26000

------------------------------------------------------------

'Fruit fly' has a gene count of 13000

'Human' has a gene count of 31000

'Nematode worm' has a gene count of 18000

'Thale cress' has a gene count of 26000

'Tuberculosis microbe' has a gene count of 4000

'Yeast' has a gene count of 6000

------------------------------------------------------------

Results from genes ...

Page 72: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

72

How to sort by the values

?

Page 73: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

73

Exercise

Protein sequences

Page 74: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

74

FASTA format

>P53_HUMAN (P04637) Cellular tumor antigen p53 (Tumor suppressor p53) (Phosphoprotein p53) (Antigen NY-CO-13) - Homo sapiens (Human).

MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGP

DEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAK

SVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVRRCPHHE

RCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEVGSDCTTIHYNYMCNS

SCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTEEENLRKKGEPHHELP

PGSTKRALPNNTSSSPQPKKKPLDGEYFTLQIRGRERFEMFRELNEALELKDAQAGKEPG

GSRAHSSHLKSKKGQSTSRHKKLMFKTEGPDSD

Page 75: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

75

Read a FASTA file#!/usr/bin/perl -w

my ( $line, $queryname, $queryseq );

while ( $line = <> )

{

if ( $line =~ />(.+?)\s.+/)

{

$queryname = $1 ;

}

else

{

chomp $line;

$queryseq = $queryseq . $line;

}

}

Page 76: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

76

Exercise Read more then one sequence Store the protein names and sequences fr

om disorder.fa by 2 array Show all of protein names and sequence

s. Show the number of proteins and residue

s.($len = length $seq;)

Page 77: 1 96-Summer 生物資訊程式設計實習 ( 二 ) Bioinformatics with Perl 8/13~8/22 蘇中才 8/24~8/29 張天豪 8/31 曾宇鳯.

77

Exercise Read more then one sequence Store the protein names and sequences fr

om disorder.fa by a hash Show the protein names and sequences s

orted by protein name Find the longest sequence