Perl
Part I: A Biology Primer
Conceptual Biology
H. sapiens did not create the genetic code – but they did invent the transistor
Biological life is not optimized – the modern synthesis
Nature vs. Nurture What are the best ways to understand
the important differences the make the difference?
A Molecular Primer
Hierarchy of the eukaryote• Organism > System > Organ > Tissue > Cell
> Organelle > Protein > RNA > DNA
Put Simply: DNA → RNA → Protein
The Building Blocks
DNA is composed of four building blocks• Nucleic acids, nucleotides, bases
• Adenine, Cytosine, Guanine, Thymine
RNA also has four building blocks• Adenine, Cytosine, Guanine, Uracil
Proteins are composed of 20 building blocks• Amino acids, residues
• Fragments of proteins are called peptides
DNA, RNA and Proteins are polymers
Code Nucleic Acid(s)
w/ Sugar w/P
A Adenine Adenosine Adenylic Acid
C Cytosine Cytodine Cytidylic Acid
G Guanine Guanosine Guanylic Acid
T Thymine Tymidine Thymidylic Acid
U Uracil Uridine Uridylic Acid
M A or C (amino) Code Nucleic Acid
R A or G (purine) V A or C or G
W A or T (weak) H A or C or T
S C or G (strong) D A or G or T
Y C or T (pyrimidine)
B C or G or T
K G or T (keto) N A, G, C, T (any)
Code Nucleic Acid(s)
w/ Sugar
w/P
A Adenine Adenosine Adenylic Acid
C Cytosine Cytodine Cytidylic Acid
G Guanine Guanosine
Guanylic Acid
T Thymine Tymidine Thymidylic Acid
U Uracil Uridine Uridylic Acid
M A or C (amino)
Code Nucleic Acid
R A or G (purine)
V A or C or G
W A or T (weak)
H A or C or T
S C or G (strong)
D A or G or T
Y C or T (pyrimidine)
B C or G or T
K G or T (keto)
N A, G, C, T (any)
DNA RNA
A = T → A
C = G → C
G = C → G
C = G → C
T = A → U
T = A → U
M = K → M
W = W → ?
N = N → N
C = G → C
C = G → C
T = A → U
Y = R → ?
B = V → ?
N = N → N
K = M → ?
S = S → S
T = A → U
T = A → U
DNA RNA
A = T → A
C = G → C
G = C → G
C = G → C
T = A → U
T = A → U
M = K → M
W = W → ?
N = N → N
C = G → C
C = G → C
T = A → U
Y = R → ?
B = V → ?
N = N → N
K = M → ?
S = S → S
T = A → U
T = A → U
•One Dimensional
•Two Dimensional
•Three Dimensional
DNA RNA
A = T → A
C = G → C
G = C → G
C = G → C
T = A → U
T = A → U
M = K → M
W = W → ?
N = N → N
C = G → C
C = G → C
T = A → U
Y = R → ?
B = V → ?
N = N → N
K = M → ?
S = S → S
T = A → U
T = A → U
DNA RNA
A = T → A
T = A → U
G = C → G
C = G → C
T = A → U
T = A → U
M = K → M
W = W → ?
N = N → N
C = G → C
C = G → C
T = A → U
Y = R → ?
B = V → ?
N = N → N
K = M → ?
S = S → S
T = A → U
T = A → U
DNA RNA
A = T → A
T = A → U
G = C → G
C = G → C
T = A → U
T = A → U
M = K → M
W = W → ?
N = N → N
C = G → C
C = G → C
T = A → U
Y = R → ?
B = V → ?
N = N → N
K = M → ?
S = S → S
T = A → U
T = A → U
One-Letter Code
Amino Acid Three-Letter Code
One-Letter Code
Amino Acid Three-Letter Code
C Cysteine Cys D Aspartic acid
Asp
E Glutamic Acid
Glu F Phenylalanin Phe
G Glycine Gly H Histidine His
I Isoleucine Ile K Lysine Lys
L Leucine Leu M Methionine Met
N Asparagine Asn P Proline Pro
Q Glutamine Gln R Argine Arg
S Serine Ser T Threonine Thr
V Valine Val W Tryptophan Trp
X Unknown Xxx Y Tyrosine Tyr
Z Glutamic acid or Glutimine Glx
DNA RNA
A = T → A
T = A → U
G = C → G
C = G → C
T = A → U
T = A → U
M = K → M
W = W → ?
N = N → N
C = G → C
C = G → C
T = A → U
Y = R → ?
B = V → ?
N = N → N
K = M → ?
S = S → S
T = A → U
T = A → U
Met (Start)
Leu
AA?, AU?, CA?, CU? -> Asn, Lys, Ile, Met, His, Gln, Val
Pro
UU?, UG?, UC?, CU?, CG?, CC? -> Phe, Leu,Cys, Stop, Trp, Ser, Leu, Arg, Pro
UCU, UGU, GCU, GGU -> Ser, Cys, Ala, Gly
DNA RNA
A = T → A
T = A → U
G = C → G
C = G → C
T = A → U
T = A → U
M = K → M
W = W → ?
N = N → N
C = G → C
C = G → C
T = A → U
Y = R → ?
B = V → ?
N = N → N
K = M → ?
S = S → S
T = A → U
T = A → U
Cys
Phe, Leu
A?C, U?C -> Ile, Thr, Asn, Ser, Phe, Ser, Tyr,Cys
Leu
U?U, U?G, C?U, C?G -> Phe, Ser, Tyr, Cys,Leu, Stop, Trp, Leu, Pro, His, Arg, Gln
GUU, CUU -> Val, Leu
DNA
RNA
Protein
Lecture II
Part II: One-Dimensional Strings
Hello World…
A few perls of wisdom Concatenating Sequences Making a reverse complement Read sequences from data files
Every journey starts with a first 10bp
#!/usr/bin/perl –w
#storing DNA in a variable, and printing it out
#First, storing DNA in a variable called $DNA
$DNA = ‘CGGGCTATTC’;
#Next, print the DNA onto the screen
print $DNA;
#Finally, specifically tell the program to end
exit;
Every journey starts with a first 10bp
#!/usr/bin/perl –w
#storing DNA in a variable, and printing it out
#First, storing DNA in a variable called $DNA
$DNA = ‘CGGGCTATTC’;
#Next, print the DNA onto the screen
print $DNA;
#Finally, specifically tell the program to end
exit;
Every journey starts with a first 10bp
#!/usr/bin/perl –w
#storing DNA in a variable, and printing it out
#First, storing DNA in a variable called $DNA
$DNA = ‘CGGGCTATTC’;
#Next, print the DNA onto the screen
print $DNA;
#Finally, specifically tell the program to end
exit;
Every journey starts with a first 10bp
#!/usr/bin/perl –w
#storing DNA in a variable, and printing it out
#First, storing DNA in a variable called $DNA
$DNA = ‘CGGGCTATTC’;
#Next, print the DNA onto the screen
print $DNA;
#Finally, specifically tell the program to end
exit;
Concatenating DNA Fragments#!/usr/bin/perl –w
#Store DNA in 2 variables
$DNA1 = ‘AGTGCGTCGCTAG’;
$DNA2 = ‘ACCGCATGCATTG’;
#using string interpolation
$DNA3 = “$DNA1$DNA2”;
print “$DNA3\n\n”;
#dot operator
$DNA3 = $DNA1 . $DNA2;
print “$DNA3\n\n”;
Print $DNA1,$DNA2,”\n”;
exit;
Transcription: DNA to RNA
#!/usr/bin/perl –w
$DNA = ‘ACGACTGCACGATCGTACG’;
#print the DNA onto the screen
print “$DNA\n\n”;
#Transcribe the DNA->RNA by substituting all T’s with U’s
$RNA = $DNA;
$RNA =~ s/T/U/g;
#print the result to the screen
print “Here is the result of DNA->RNA:\t$RNA\n\n”;
exit;
$RNA =~ s/T/U/g;
Variable Binding Operator
Delimiters to separate the operator
Substituteoperator
Pattern to bereplaced
ReplacementText of replacepattern
Pattern modifier
g = globally
i = case insensititve
m = multiline
s = single line
x = permit comments
o = compile only once for
speed
e = treat replacement as Perl code
Calculating the Reverse Complement#!usr/bin/perl –w
$DNA = ‘ACGTCAGTCGAGCT’;
#print the starting DNA onto the screen
print “Here is the starting DNA:\t$DNA\n\n”;
#Calculate the reverse complement, first copying the DNA onto #a new variable called $revcom
$revcom = reverse $DNA;
#substitute all bases by their complement
$revcom =~ s/A/T/g;
$revcom =~ s/T/A/g;
$revcom =~ s/C/G/g;
$revcom =~ s/G/C/g;
print “$revcom\n”;
Calculating the Reverse Complement#!usr/bin/perl –w
$DNA = ‘ACGTCAGTCGAGCT’;
#print the starting DNA onto the screen
print “Here is the starting DNA:\t$DNA\n\n”;
#Calculate the reverse complement, first copying the DNA onto #a new variable called $revcom
$revcom = reverse $DNA;
#substitute all bases by their complement
$revcom =~ tr/ACGTacgt/TGCAtgca/;
print “$revcom\n”;
Reading Data from Files
#### Sample Data in FASTA Format ####
>NM_012345 | Sample Data | Muppet Stuffing Protein
MNIDDKLEFGDEMGOSSRTMV
FGDLVRSMPHOEILAADEVLISHEE
GLOYAKLEFGDEMGOGHDDEFGVY
Reading Files
#!/usr/bin/perl –w
#The filename of the file containing the sequence data
$proteinFilename = ‘NM_012345.pep’;
#open the file, and associate a ‘filehandle’ with it
open (PROTEINFILE {IN}, $proteinFilename);
#assign file with an input operator
$muppetProtein = <PROTEINFILE>;
#print the protein file
print “Here is the protein:\t$muppetProtein\n\n”;
exit;
Reading Data from Files
#### Sample Data in FASTA Format ####
>NM_012345 | Sample Data | Muppet Stuffing Protein
MNIDDKLEFGDEMGOSSRTMV
FGDLVRSMPHOEILAADEVLISHEE
GLOYAKLEFGDEMGOGHDDEFGVY
Lets try this again …
#!usr/bin/perl –w
$proteinFilename = ‘NM_012345.pep’;
open(PROTEINFILE, $proteinFilename);
$muppetProtein = <PROTEINFILE>;
print “Here is the first line:\t$muppetProtein\n\n”;
$muppetProtein = <PROTEINFILE>;print “Here is the second line:\t$muppetProtein\n\n”;
$muppetProtein = <PROTEINFILE>;print “Here is the third line:\t$muppetProtein\n\n”;
close PROTEINFILE;
exit;
Using Arrays to Read Files
#!usr/bin/perl –w
$proteinFilename = ‘NM_012345’;
#open the file
open(PROTEINFILE, $proteinFilename);
#Read the sequence data from the file, and store it in the array #variable @protein
@protein = <PROTEINFILE>;
#print the protein onto the screen
print @protein;
close PROTEINFILE;
exit;
Arrays
#Here’s one way to declare an array
@bases = (‘A’,’C’,’G’,’T’);
#Now print each element of the array
print “\nFirst element: “ , $bases[0];
print “\nSecond Element: “ , $bases[1];
print “\nThird Element: “ , $bases[2];
print “\nFourth Element: “ , $bases[3];
Arrays
#Here’s one way to declare an array
@bases = (‘A’,’C’,’G’,’T’);
#Now print each element of the array in a row
print “\nHere are all of the bases: “ , @bases;
#This prints out: ‘Here are all of the bases: ACGT’
#But, you can print them out with spaces in between
print “\nHere they are with spaces” , “@bases”;
Arrays
#Here’s one way to declare an array
@bases = (‘A’,’C’,’G’,’T’);
#Here’s how to take an element off of the end
$base1 = pop @bases;
print “Here’s the last element: “, $base1, “\n\n”;
#The other elements still remain
print “\nHere are the remaining elements: ” , “@bases”;
Arrays
#Here’s one way to declare an array
@bases = (‘A’,’C’,’G’,’T’);
#Here’s how to take an element off of the front
$base2 = shift @bases;
print “Here’s the first element: “, $base2, “\n\n”;
#The other elements still remain
print “\nHere are the remaining elements: ” , “@bases”;
Arrays
#Here’s one way to declare an array
@bases = (‘A’,’C’,’G’,’T’);
#Here’s how you put an element at the beginning of an array
#Our example will put the last element at the beginning
$base1 = pop @bases;
unshift (@bases, $base1);
print “Here’s the last element put first: “ , “@bases\n\n”;
Arrays
#Here’s one way to declare an array
@bases = (‘A’,’C’,’G’,’T’);
#Here’s how you put an element at the end of an array
#Our example will put the first element at the end
$base1 = shift @bases;
push (@bases, $base1);
print “Here’s the first element put last: “ , “@bases\n\n”;
Arrays
#Here’s one way to declare an array
@bases = (‘A’,’C’,’G’,’T’);
#Here’s how to reverse an array
@reverse = reverse @bases;
#Here’s how to get the length
print scaler @bases, “\n\n”;
#Here’s how to insert an element at an arbitrary place
splice (@bases, 2, 0, ‘X’);
Arrays
#Arrays can be evaluated as lists and scalers
@bases = (‘A’,’C’,’G’,’T’);
#Here’s how to print the array
print “@bases\n”;
#Here’s how to assign it to a scaler
$a = @bases; print $a;
#Here’s how to assign an array to a list
($a) = @bases; print $a;