Top Banner

of 88

python for biologists

Jul 07, 2018

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/19/2019 python for biologists

    1/227

  • 8/19/2019 python for biologists

    2/227

    i

    Copyright © 2013 Dr. Martin Jones

     This work is licensed under a Creative Commons Attribution-NonCommercial-

    ShareAlike 3.0 Unported License.

    For more information, visit http://pythonforbioogists.!om

    "et in #$ "erif an% Source Code Pro

    http://creativecommons.org/licenses/by-nc-sa/3.0/deed.en_GBhttp://creativecommons.org/licenses/by-nc-sa/3.0/deed.en_GBhttp://creativecommons.org/licenses/by-nc-sa/3.0/deed.en_GBhttp://pythonforbiologists.com/http://pythonforbiologists.com/http://creativecommons.org/licenses/by-nc-sa/3.0/deed.en_GBhttp://creativecommons.org/licenses/by-nc-sa/3.0/deed.en_GB

  • 8/19/2019 python for biologists

    3/227

    ii

     About the author 

    Martin starte% his programming !areer by earning #er %&ring the !o&rse ofhis #hD in evo&tionary bioogy, an% starte% tea!hing other peope to

    program soon after. "in!e then he has ta&ght intro%&!tory programming to

    h&n%re%s of bioogists, from &n%ergra%&ates to #'s, an% has maintaine% a

    phiosophy that programming !o&rses m&st be frien%y, approa!habe, an%

    pra!ti!a.

    Martin has ta&ght intro%&!tory programming as part of the (ioinformati!sM"! !o&rse at )%inb&rgh *niversity for the past five years, an% is !&rrenty

    +e!t&rer in (ioinformati!s.

  • 8/19/2019 python for biologists

    4/227

    iii

     Preface

    e!ome to #ython for (ioogists.

    (efore yo& rea% any f&rther, ma-e s&re that this is the most re!ent version of

    the boo-. #ython for (ioogists is being !ontin&ay &p%ate% an% improve% to

    ta-e into a!!o&nt !orre!tions, amen%ments an% !hanges to #ython itsef, so

    its important that yo& are rea%ing the most &pto%ate version.

    $his fie is revision n&mber 189. $he n&mber of the most re!ent revision !an

    aays be fo&n% at:

    http://pythonforbioogists.!om/in%e.php/version/ 

    'f the revision n&mber iste% at the *+ is higher than the one in bo%, then

    this is an o&tof%ate !opy, an% yo& nee% to %onoa% the atest version from

    http://pythonforbioogists.!om

    o& noti!e from the !opyright page that the !ontents of this boo- are

    i!ense% &n%er a Creative Commons 4ttrib&tion "hare4i-e i!ense. $his

    means that yo&re free to %o hat yo& i-e ith it 5 !opy it, emai it to yo&rfrien%s, apaper yo&r ab ith it 5 as ong as yo& -eep the attrib&tion. o&

    !an aso mo%ify it, as ong as yo& i!ense yo&r mo%ifi!ation &n%er the same

    terms. $he ony thing that the i!ense %oesnt ao is !ommer!ia &se 5 if

     yo&% i-e to &se the !ontents of this !o&rse for !ommer!ia p&rposes, get in

    to&!h ith me at

    martin6pythonforbioogists.!om

    7appy programming8

    http://pythonforbiologists.com/index.php/version/http://pythonforbiologists.com/index.php/version/

  • 8/19/2019 python for biologists

    5/227

    iv 

    Table of Contents

     About the author » ii Preface » iii

    1: Introduction and environment 1

    Why have a programming book for biologists? » 1

    Why Python? » 2

     How to use this book » 5 

     !ercises an" solutions » # 

    $etting in touch » %

    &etting up your environment » %'e!t e"itors » 11

     (ea"ing the "ocumentation » 12

    2: Printing and manipulating text 13

    Why are we so intereste" in working with te!t? » 1) 

     Printing a message to the screen » 1*

    +uotes are important » 15 

    ,se comments to annotate your co"e » 1-

     rror messages an" "ebugging » 1% Printing special characters » 21

    &toring strings in variables » 21

    'ools for manipulating strings » 2*

     (ecap » )*

     !ercises » )-

    &olutions » ).

    3: Reading and riting files !2

    Why are we so intereste" in working with files? » 52 (ea"ing te!t from a file » 5) 

     /iles0 contents an" file names » 55 

     ealing with newlines » 5# 

     issing files » -3

  • 8/19/2019 python for biologists

    6/227

     v 

    Writing te!t to files » -3

    4losing files » -) 

     Paths an" fol"ers » -) 

     (ecap » -* !ercises » -5 

    &olutions » -# 

    ": #ists and loops $"

    Why "o we nee" lists an" loops? » #*

    4reating lists an" retrieving elements » #-

    Working with list elements » ## 

    Writing a loop » #.

     n"entation errors » %2,sing a string as a list » %) 

    &plitting a string to make a list » %*

     terating over lines in a file » %*

     6ooping with ranges » %5 

     (ecap » %# 

     !ercises » %.

    &olutions » .3

    !: %riting our on functions 99Why "o we want to write our own functions? » ..

     efining a function » 133

    4alling an" improving our function » 13) 

     ncapsulation with functions » 135 

     /unctions "on7t always have to take an argument » 13-

     /unctions "on7t always have to return a value » 13%

     /unctions can be calle" with name" arguments » 13%

     /unction arguments can have "efaults » 113

    'esting functions » 111

     (ecap » 11) 

     !ercises » 115 

    &olutions » 11-

  • 8/19/2019 python for biologists

    7/227

     vi

    &: Conditional tests 121

     Programs nee" to make "ecisions » 121

    4on"itions0 'rue an" /alse » 121if statements » 12*

    else statements » 125 

    elif statements » 12-

    while loops » 12%

     8uil"ing up comple! con"itions » 12%

    Writing true9false functions » 1)3

     (ecap » 1)1

     !ercises » 1)) 

    &olutions » 1)5 

    $: Regular expressions 1"1

    'he importance of patterns in biology » 1*1

     o"ules in Python » 1*) 

     (aw strings » 1**

    &earching for a pattern in a string » 1*5 

     !tracting the part of the string that matche" » 153

    $etting the position of a match » 152

    &plitting a string using a regular e!pression » 15)  /in"ing multiple matches » 15*

     (ecap » 155 

     !ercises » 15# 

    &olutions » 15%

    8: 'ictionaries 1&8

    &toring paire" "ata » 1-%

    4reating a "ictionary » 1#) 

     terating over a "ictionary » 1#. (ecap » 1%2

     !ercises » 1%) 

    &olutions » 1%*

    9: (iles) programs) and user input 19!

  • 8/19/2019 python for biologists

    8/227

     vii

     /ile contents an" manipulation » 1.5 

     8asic file manipulation » 1.-

     eleting files an" fol"ers » 1.%

     6isting fol"er contents » 1.% (unning e!ternal programs » 1..

     (unning a program » 233

    &aving program output » 231

    ,ser input makes our programs more fle!ible » 231

     nteractive user input » 23) 

    4omman" line arguments » 23*

     (ecap » 235 

     !ercises » 23# 

    &olutions » 23%

  • 8/19/2019 python for biologists

    9/227

    1 Chapter 1: 'ntro%&!tion an% environmen

    1: Introduction and environment

    Why have a programming book for biologists? 

    'f yo&re rea%ing this boo-, then yo& probaby %ont nee% to be !onvin!e% that

    programming is be!oming an in!reasingy essentia part of the too -it for

    bioogists of a types. o& might, hoever, nee% to be !onvin!e% that a boo- i-e

    this one, %eveope% espe!iay for bioogists, !an %o a better 9ob of tea!hing yo& to

    program than a generap&rpose intro%&!tory programming boo-. 7ere are a fe of

    the reason hy ' thin- that is the !ase.4 bioogyspe!ifi! programming boo- aos &s to &se eampes an% eer!ises that

    &se bioogi!a probems. $his serves to important p&rposes: firsty, it provi%es

    motivation an% %emonstrates the types of probems that programming !an hep to

    sove. )perien!e has shon that beginners ma-e m&!h better progress hen they

    are motivate% by the tho&ght of ho the programs they rite i ma-e their ife

    easier8 "e!on%y, by &sing bioogi!a eampes, the !o%e an% eer!ises thro&gho&t

    the boo- !an form a ibrary of &sef& !o%e snippets, hi!h e !an refer ba!- to

     hen e ant to sove reaife probems. 'n bioogy, as in a fie%s of

    programming, the same probems ten% to re!&r time an% time again, so its very

    &sef& to have this !oe!tion of eampes to a!t as a referen!e 5 something thats

    not possibe ith a generap&rpose programming boo-.

    4 bioogyspe!ifi! programming boo- !an aso !on!entrate on the feat&res of the

    ang&age that are most &sef& to bioogists. 4 ang&age i-e #ython has many

    feat&res an% in the !o&rse of earning it e inevitaby have to !on!entrate on some

    an% miss others o&t. $he set of feat&res hi!h are important to &s in bioogy aresighty %ifferent to those hi!h are most &sef& for generap&rpose programming

    5 for eampe, e are m&!h more intereste% in manip&ating tet in!&%ing things

    i-e D;4 an% protein se

  • 8/19/2019 python for biologists

    10/227

    2 Chapter 1: 'ntro%&!tion an% environmen

    programming boo-, b&t hi!h are very &sef& to bioogists for eampe, reg&ar

    epressions an% s&bpro!esses=. 7aving a bioogyspe!ifi! tetboo- aos &s to

    in!&%e these feat&res, aong ith epanations of hy they are parti!&ary &sef&to &s.

    4 reate% point is that a tetboo- ritten 9&st for bioogists aos &s to intro%&!e

    feat&res in a ay that aos &s to start riting &sef& programs right aay. e !an

    %o this by ta-ing into a!!o&nt the sorts of probems that repeate%y !rop &p in

    bioogy, an% prioritising the feat&res that are best at soving them. $his boo- has

    been %esigne% so that yo& sho&% be abe to start riting sma b&t &sef& programs

    &sing ony the toos in the first !o&pe of !hapters.

    Why Python? 

    +et me start this se!tion ith the fooing statement: programming ang&ages are

    overrate%. hat ' mean by that is that peope ho are ne to programming ten% to

     orry far too m&!h abo&t hat ang&age to earn. $he !hoi!e of programming

    ang&age %oes matter, of !o&rse, b&t it matters far ess than peope thin- it %oes. $o

    p&t it another ays, !hoosing the >rong> programming ang&age is very &ni-ey

    to mean the %ifferen!e beteen fai&re an% s&!!ess hen earning. ?ther fa!tors

    motivation, having time to %evote to earning, hepf& !oeag&es= are far more

    important, yet re!eive ess attention.

    $he reason that peope pa!e so m&!h eight on the :what language shoul" learn?:

  • 8/19/2019 python for biologists

    11/227

    3 Chapter 1: 'ntro%&!tion an% environmen

    &sing m&tipe ang&ages. #arty this is 9&st %on to the simpe !onstraints of

     vario&s ang&ages 5 if yo& ant to rite a eb appi!ation yo& probaby %o it in

    Javas!ript, if yo& ant to rite a graphi!a &ser interfa!e yo& probaby &sesomething i-e Java, an% if yo& ant to rite oeve agorithms yo& probaby

    &se C.

    "e!on%y, earning a first programming ang&age gets yo& @0A of the ay toar%s

    earning a se!on%, thir%, an% fo&rth one. +earning to thin- i-e a programmer in the

     ay that yo& brea- %on !ompe tas-s into simpe ones is a s-i that !&ts a!ross

    a ang&ages 5 so if yo& spen% a fe months earning #ython an% then %is!over

    that yo& reay nee% to rite in C, yo&r time ont have been aste% as yo& be

    abe to pi!- it &p m&!h

  • 8/19/2019 python for biologists

    12/227

    B Chapter 1: 'ntro%&!tion an% environmen

    • 'ts &se of in%entation, hie annoying to peope ho arent &se% to it, is

    great for beginners as it enfor!es a !ertain amo&nt of rea%abiity 

    #ython aso has a !o&pe of points to re!ommen% it to bioogists an% s!ientistsspe!ifi!ay:

    • 'ts i%ey &se% in the s!ientifi! !omm&nity 

    • 't has a !o&pe of very e%esigne% ibraries for %oing !ompe s!ientifi!

    !omp&ting atho&gh e ont en!o&nter them in this boo-=

    • 't en% itsef e to being integrate% ith other, eisting toos

    't has feat&res hi!h ma-e it easy to manip&ate strings of !hara!ters foreampe, strings of D;4 bases an% protein amino a!i% resi%&es, hi!h e as

    bioogists are parti!&ary fon% of=

    P*t+on vs, Perl

    For bioogists, the

  • 8/19/2019 python for biologists

    13/227

    Chapter 1: 'ntro%&!tion an% environmen

     How to use this book

    #rogramming boo-s generay fa into to !ategories referen!etype boo-s, hi!h

    are %esigne% for oo-ing &p spe!ifi! bits of information, an% t&toriatype boo-s,

     hi!h are %esigne% to be rea% !overto!over. $his boo- is an eampe of the atter

    5 !o%e sampes in ater !hapters often &se materia from previo&s ones, so yo& nee%

    to ma-e s&re yo& rea% the !hapters in or%er. )er!ises or eampes from one

    !hapter are sometimes &se% to i&strate the nee% for feat&res that are intro%&!e%

    in the net.

    $here are a n&mber of f&n%amenta programming !on!epts that are reevant to

    materia in m&tipe %ifferent !hapters. 'n this boo-, rather than intro%&!e these!on!epts a in one go, 've trie% to epain them as they be!ome ne!essary. $his

    res&ts in a ten%en!y for earier !hapters to be onger than ater ones, as they

    invove the intro%&!tion of more ne !on!epts.

    4 !ertain amo&nt of 9argon is ne!essary if e ant to ta- abo&t programs an%

    programming !on!epts. 've trie% to %efine ea!h ne te!hni!a term at the point

     here its intro%&!e%, an% then &se it thereafter ith o!!asiona remin%ers of the

    meaning.Chapters ten% to foo a pre%i!tabe str&!t&re. $hey generay start ith a fe

    paragraphs o&tining the motivation behin% the feat&res that it i !over 5 hy %o

    they eist, hat probems %o they ao &s to sove, an% hy are they &sef& in

    bioogy spe!ifi!ayE $hese are fooe% by the main bo%y of the !hapter in hi!h

     e %is!&ss the reevant feat&res an% ho to &se them. $he ength of the !hapters

     varies

  • 8/19/2019 python for biologists

    14/227

    G Chapter 1: 'ntro%&!tion an% environmen

    footnotes1 to provi%e a%%itiona information that is interesting to -no b&t not

    !r&!ia to &n%erstan%ing, or to give in-s to eb pages.

    )ampe !o%e is highighte% ith a soi% bor%er:

    Some example code goes here

    an% eampe o&tp&t i.e. hat e see on the s!reen hen e r&n the !o%e= is

    highighte% ith a %otte% bor%er:

    Some output goes here

    ?ften e ant to oo- at the !o%e an% the o&tp&t it pro%&!es together. 'n these

    sit&ations, yo& see a re%bor%ere% !o%e bo!- fooe% imme%iatey by a b&e

    bor%ere% o&tp&t bo!-.

    "ometimes its ne!essary to refer in the tet to in%ivi%&a ines of !o%e or o&tp&t, in

     hi!h !ase 've &se% ine n&mberings on the eft:

    first linesecond linethird line

    ?ther bo!-s of tet &s&ay fie !ontents or type% !omman% ines= %ont have any

    -in% of bor%er an% oo- i-e this:

    contents of a file

    1 +i-e this.

    1

    2

    3

  • 8/19/2019 python for biologists

    15/227

    H Chapter 1: 'ntro%&!tion an% environmen

     !ercises an" solutions

    $he fina part of ea!h !hapter is a set of eer!ises an% so&tions. $he n&mber an%

    !ompeity of eer!ises %iffer greaty beteen !hapters %epen%ing on the nat&re of

    the materia. 4s a r&e, eary !hapters have a arge n&mber of simpe eer!ises,

     hie ater !hapters have a sma n&mber of more !ompe ones. Many of the

    eer!ise probems are ritten in a %eiberatey vag&e manner an% the ea!t %etais

    of ho the so&tions or- is &p to yo& very m&!h i-e reaife programming8= o&

    !an aays oo- at the so&tions to see one possibe ay of ta!-ing the probem,

    b&t there are often m&tipe vai% approa!hes.

    ' strongy re!ommen% that yo& try ta!-ing the eer!ises yo&rsef before rea%ingthe so&tions there reay is no s&bstit&te for pra!ti!a eperien!e hen earning to

    program. ' aso en!o&rage yo& to a%opt an attit&%e of !&rio&s eperimentation

     hen or-ing on the eer!ises 5 if yo& fin% yo&rsef on%ering if a parti!&ar

     variation on a probem is sovabe, or if yo& re!ognie a !oseyreate% probem

    from yo&r on or-, try soving it8 Contin&o&s eperimentation is a -ey part of

    %eveoping as a programmer, an% the

  • 8/19/2019 python for biologists

    16/227

    I Chapter 1: 'ntro%&!tion an% environmen

    $etting in touch

    ?ne of the most !onvin!ing arg&ments for presenting a !o&rse i-e this one in the

    form of an eboo- is that it !an be !ontin&ay &p%ate% an% tea-e% base% on rea%er

    fee%ba!-. "o, if yo& fin% anything that is har% to &n%erstan%, or yo& thin- may

    !ontain an error, pease get in to&!h 5 9&st %rop me an emai at

    martin6pythonforbioogists.!om an% ' promise to get ba!- to yo&.

    &etting up your environment 

    4 that yo& nee% in or%er to foo the eampes an% eer!ises in this boo- is a

    stan%ar% #ython instaation an% a tet e%itor. 4 the !o%e in this boo- i r&n on

    either +in&, Ma! or in%os ma!hines. $he sight %ifferen!es beteen operating

    systems are epaine% in the tet mosty in !hapter @=. 'f yo& have a !hoi!e of

    operating systems on hi!h to earn #ython, ' re!ommen% +in&, Ma! ?" an%

    in%os in that or%er, simpy be!a&se the *;'base% operating systems +in&

    an% ?"= are more amenabe to programming in genera.

    Installing P*t+on$he pro!ess of instaing #ython %epen%s on the type of !omp&ter yo&re r&nning

    on. 'f yo&re r&nning a mainstream +in& %istrib&tion i-e *b&nt&, #ython is

    probaby area%y instae%. $o fin% o&t, open a termina an% type

    python

    'f yo& see some o&tp&t aong these ines:

    Python 2.7.3 (default, Apr 10 2013, 0!13!1"#$%&& '.7.2 on linux2)ype *help*, *copyright*, *credits* or *license* for more information.+++

    mailto:[email protected]:[email protected]

  • 8/19/2019 python for biologists

    17/227

    @ Chapter 1: 'ntro%&!tion an% environmen

    $hen yo& are rea%y to go. 'f yo&r +in& instaation %oesnt area%y have #ython

    instae%, try instaing it ith yo&r pa!-age manager the !omman% i probaby

    be either sudo apt-get install python or sudo yum install python=.'f this %oesnt or-, then %onoa% the pa!-age from the #ython %onoa% page1.

    $he offi!ia #ython ebsite has instaation instr&!tions for Ma!2 an% in%os3 

    !omp&ters as e these are i-ey to be the most &pto%ate instr&!tions, so foo

    them !osey.

    Running P*t+on programs

    4 #ython program is 9&st a norma tet fie that !ontains #ython !o%e. $o r&n it em&st first open &p a !omman% ine. ?n +in& an% Ma! !omp&ters, the appi!ation

    to %o this i be !ae% something aong the ines of >termina>. ?n in%os, it is

    -non as >!omman% prompt>.

    $o r&n a #ython program, e 9&st type the path to the #ython ee!&tabe fooe%

    by the name of the fie that !ontains the !o%e e ant to r&nB. ?n a +in& or Ma!

    ma!hine, the path i be something i-e:

    usrlocal-inpython

    ?n in%os, it i be something i-e:

    c!Python27python

    1 http://.python.org/getit/

    2 http://.python.org/getit/ma!/

    3 http://.python.org/getit/in%os/

    B hen e refer to >a #ython program> in this boo-, e are &s&ay ta-ing abo&t the tet fie that ho%s the

    !o%e.

    http://www.python.org/getit/http://www.python.org/getit/mac/http://www.python.org/getit/windows/http://www.python.org/getit/http://www.python.org/getit/mac/http://www.python.org/getit/windows/

  • 8/19/2019 python for biologists

    18/227

    10 Chapter 1: 'ntro%&!tion an% environmen

    $o r&n a #ython program, its generay easiest to be in the same fo%er as it. (y

    !onvention, #ython programs are given the etension .py, so to r&n a program

    !ae% test.py, e 9&st type:

    usrlocal-inpython test.py

    $here are a !o&pe of tri!-s that !an be &sef& hen eperimenting ith programs1

    Firsty, yo& !an r&n #ython in an intera!tive or >she>= mo%e by r&nning it itho&t

    the name of a program fie. $his aos yo& to type in%ivi%&a statements an% see

    the res&t straight aay.

    "e!on%y, yo& !an r&n #ython ith the -i option, hi!h i !a&se it to r&n yo&rprogram an% t+en enter intera!tive mo%e. $his !an be han%y if yo& ant to

    eamine the state of variabes after yo&r !o%e has r&n.

    P*t+on 2 vs, P*t+on 3

    4s i

  • 8/19/2019 python for biologists

    19/227

    11 Chapter 1: 'ntro%&!tion an% environmen

    'f yo&re going to &se #ython 2, there is 9&st one thing that yo& have to %o in or%er

    to ma-e some of the !o%e eampes or-: in!&%e this ine at the start of a yo&r

    programs:

    from //future// import diision

    e ont go into the epanation behin% this ine, e!ept to say that its ne!essary

    in or%er to !orre!t a sma

  • 8/19/2019 python for biologists

    20/227

    12 Chapter 1: 'ntro%&!tion an% environmen

    fies the probem by ma-ing it effe!tivey impossibe for yo& to type a tab

    !hara!ter.

    $he feat&re that is ni!e to have is synta! highlighting . $his i appy %ifferent!oo&rs to %ifferent parts of yo&r #ython !o%e, an% !an hep yo& spot errors more

    easiy.

    e!ommen%e% tet e%itors are .otepad// for in%os1, Text%rangler for Ma!

    ?"2, an% gedit for +in&3, a of hi!h are freey avaiabe.

    ?n the eb an% esehere yo& may see referen!es to #ython 'D)s. 'D) stan%s for

    'ntegrate% Deveopment )nvironment, an% they typi!ay !ombine a tet e%itor

     ith a !oe!tion of other &sef& programming toos. hie they !an spee% &p%eveopment for eperien!e% programmers, theyre not a goo% i%ea for beginners as

    they !ompi!ate things, so ' %ont re!ommen% yo& &se them.

     (ea"ing the "ocumentation

    #art of the tea!hing phiosophy that 've &se% in riting this boo- is that its better

    to intro%&!e a fe &sef& feat&res an% f&n!tions rather than overhem yo& ith a

    !omprehensive ist. $he best pa!e to go hen yo& %o ant a !ompete ist of theoptions avaiabe in #ython is the offi!ia %o!&mentationB hi!h, !ompare% to

    many ang&ages, is very rea%abe.

    1 http://notepa%p&sp&s.org/

    2 http://.barebones.!om/pro%&!ts/$etranger/

    3 https://pro9e!ts.gnome.org/ge%it/

    B http://.python.org/%o!/

    http://notepad-plus-plus.org/http://www.barebones.com/products/TextWrangler/https://projects.gnome.org/gedit/http://www.python.org/doc/http://notepad-plus-plus.org/http://www.barebones.com/products/TextWrangler/https://projects.gnome.org/gedit/http://www.python.org/doc/

  • 8/19/2019 python for biologists

    21/227

    13 Chapter 2: #rinting an% manip&ating te

    2: Printing and manipulating text

    Why are we so intereste" in working with te!t? 

    ?pen the first page of a boo- abo&t earning #ython1, an% the !han!es are that the

    first eampes of !o%e yo& see invove numbers. $heres a goo% reason for that:

    n&mbers are generay simper to or- ith than tet 5 there are not too many

    things yo& !an %o ith them on!e yo&ve got basi! arithmeti! o&t of the ay= an%

    so they en% themseves e to eampes that are easy to &n%erstan%. 'ts aso a

    pretty safe bet that the average person rea%ing a programming boo- is %oing sobe!a&se they nee% to %o some n&mber!r&n!hing.

    "o hat ma-es this boo- %ifferent 5 hy is this first !hapter abo&t tet rather than

    n&mbersE $he anser is that, as bioogists, e have a parti!&ar interest in %eaing

     ith tet rather than n&mbers tho&gh of !o&rse, e nee% to earn ho to

    manip&ate n&mbers too=. "pe!ifi!ay, ere intereste% in parti!&ar types of tet

    that e !a se;uences < the D;4 an% protein se

  • 8/19/2019 python for biologists

    22/227

    1B Chapter 2: #rinting an% manip&ating te

    the or% e &se to refer to a bit of tet in a !omp&ter program it 9&st means a

    string of !hara!ters=. From this point on e &se the or% string  hen ere ta-ing

    abo&t !omp&ter !o%e, an% e reserve the or% se;uence for hen ere %is!&ssingbioogi!a se

  • 8/19/2019 python for biologists

    23/227

    1 Chapter 2: #rinting an% manip&ating te

    $he arg&ments te #ython hat e ant to %o more spe!ifi!ay 5 in this !ase, the

    arg&ment tes #ython ea!ty hat it is e ant to print: a frien%y greeting.

    4ss&ming yo&ve fooe% the instr&!tions in !hapter 1 an% set &p yo&r #ythonenvironment, type the ine of !o%e above into yo&r favo&rite tet e%itor, save it, an%

    r&n it. o& sho&% see a singe ine of o&tp&t i-e this:

    ello orld

    L&otes are important

    'n norma riting, e ony s&rro&n% a bit of tet in

  • 8/19/2019 python for biologists

    24/227

    1G Chapter 2: #rinting an% manip&ating te

    print(*She said, ello orld*#print(e said, *ello orld*#

    $he above !o%e i give the fooing o&tp&t:

    She said, ello orld

    e said, *ello orld*

    (e !aref& hen riting an% rea%ing !o%e that invoves

  • 8/19/2019 python for biologists

    25/227

    1H Chapter 2: #rinting an% manip&ating te

    • (e!a&se the !omments are part of the so&r!e !o%e, they !an never get mie%

    &p or separate%. 'n other or%s, if yo& are oo-ing at the so&r!e !o%e for a

    parti!&ar program, then yo& a&tomati!ay have the %o!&mentation as e.'n !ontrast, if yo& -eep the %o!&mentation in a separate fie, it !an easiy

    be!ome separate% from the !o%e.

    • 7aving the !omments right net to the !o%e a!ts as a remin%er to &p%ate the

    %o!&mentation henever yo& !hange the !o%e. $he ony thing orse than

    &n%o!&mente% !o%e is !o%e ith o% %o!&mentation that is no onger

    a!!&rate8

    Dont ma-e the mista-e, by the ay, of thin-ing that !omments are ony &sef& if yo& are panning on shoing yo&r !o%e to somebo%y ese. hen yo& start riting

     yo&r on !o%e, yo& i be amae% at ho

  • 8/19/2019 python for biologists

    26/227

    1I Chapter 2: #rinting an% manip&ating te

    )rror messages an% %eb&gging 

    't may seem %epressing eary in the boo- to be ta-ing abo&t errors8 7oever, its

     orth pointing o&t at this eary stage that computer programs almost never

     or0 correctl* t+e first time. #rogramming ang&ages are not i-e nat&ra

    ang&ages 5 they have a very stri!t set of r&es, an% if yo& brea- any of them, the

    !omp&ter i not attempt to g&ess hat yo& inten%e%, b&t instea% i stop

    r&nning an% present yo& ith an error message. o&re going to be seeing a ot of

    these error messages in yo&r programming !areer, so ets get &se% to them as soon

    as possibe.

    (orgetting uotes

    7eres one possibe error e !an ma-e hen printing a ine of o&tp&t 5 e !an

    forget to in!&%e the

  • 8/19/2019 python for biologists

    27/227

    1@ Chapter 2: #rinting an% manip&ating te

    eferring to the ine n&mbers on the eft e !an see that the name of the #ython

    fie is error.py ine 1= an% that the error o!!&rs on the first ine of the fie ine

    2=. #ythons best g&ess at the o!ation of the error is 9&st before the !ose

    parentheses ine 3=. Depen%ing on the type of error, this !an be rong by

  • 8/19/2019 python for biologists

    28/227

    20 Chapter 2: #rinting an% manip&ating te

    $his time, #ython %oesnt try to sho &s here on the ine the error o!!&rre%, it

    9&st shos &s the hoe ine ine B=. $he error message tes &s hi!h or% #ython

    %oesnt &n%erstan% ine =, so in this !ase, its 7eo> on one ine an% then the or% >or%> on the net

    ine 5 i-e this:

    ello

    =orld

    e might try p&tting a ne ine in the mi%%e of o&r string i-e this:

    print(*ello

    =orld*#

    b&t that ont or- an% e get the fooing error message:

    6 python error.py  ile *error.py*, line 1

    print(*ello8

    Syntax9rror! 9>? hile scanning string literal

    #ython fin%s the error hen it gets to the en% of the first ine of !o%e ine 2 in the

    o&tp&t=. $he error message ine = is a bit more !rypti! than the others. >6 stan%s

    for )n% ?f +ine, an% string literal means a string in

  • 8/19/2019 python for biologists

    29/227

    21 Chapter 2: #rinting an% manip&ating te

     Printing special characters

    $he reason that the !o%e above %i%nt or- is that #ython got !onf&se% abo&t

     hether the ne ine as part of the string hi!h is hat e ante%= or part of the

    source co"e hi!h is ho it as a!t&ay interprete%=. hat e nee% is a ay to

    in!&%e a ne ine as part of a string, an% &!-iy for &s, #ython has 9&st s&!h a too

    b&it in. $o in!&%e a ne ine, e rite a ba!-sash fooe% by the etter n 5

    #ython -nos that this is a spe!ia !hara!ter an% i interpret it a!!or%ingy.

    7eres the !o%e hi!h prints >7eo or%> a!ross to ines:

    4 ho to include a ne line in the middle of a stringprint(*ellonorld*#

    ;oti!e that theres no nee% for a spa!e before or after the ne ine.

    $here are a fe other &sef& spe!ia !hara!ters as e, a of hi!h !onsist of a

    ba!-sash fooe% by a etter. $he ony ones hi!h yo& are i-ey to nee% for the

    eer!ises in this boo- are the tab !hara!ter \t= an% the carriage return !hara!ter

    \r=. $he tab !hara!ter !an sometimes be &sef& hen riting a program that i

    pro%&!e a ot of o&tp&t. $he !arriage ret&rn !hara!ter or-s a bit i-e a ne ine inthat it p&ts the !&rsor ba!- to the start of the ine, b&t %oesnt a!t&ay start a ne

    ine, so yo& !an &se it to overrite o&tp&t 5 this is sometimes &sef& for ong

    r&nning programs.

    &toring strings in variables

    ?K, eve been paying aro&n% ith the print f&n!tion for a hie ets intro%&!e

    something ne. e !an ta-e a string an% assign a name to it &sing an e

  • 8/19/2019 python for biologists

    30/227

    22 Chapter 2: #rinting an% manip&ating te

    $he variabe my_dna no points to the string !"#C#"!. e !a this assigning  a

     variabe, an% on!e eve %one it, e !an &se the variabe name instea% of the string

    itsef 5 for eampe, e !an &se it in a print statement

    1

    :

    4 store a short @

  • 8/19/2019 python for biologists

    31/227

  • 8/19/2019 python for biologists

    32/227

    2B Chapter 2: #rinting an% manip&ating te

    'ools for manipulating strings

    ;o e -no ho to store an% print strings, e !an ta-e a oo- at a fe of the

    fa!iities that #ython has for manip&ating them. #ython has many b&itin toos

    for !arrying o&t !ommon operations, an% in this net se!tion e ta-e a oo- at

    them onebyone. 'n the eer!ises at the en% of this !hapter, e oo- at ho e

    !an &se m&tipe %ifferent toos together in or%er to !arry o&t more !ompe

    operations.

    Concatenation

    e !an !on!atenate sti!- together= to strings &sing the V symbo1. $his symbo i 9oin together the string on the eft ith the string on the right:

    my/dna B *AA))* C *%%&&*print(my/dna#

    +ets ta-e a oo- at the o&tp&t:

    AA))%%&&

    'n the above eampe, the things being !on!atenate% ere strings, b&t e !an aso

    &se variabes that point to strings:

    upstream B *AAA*my/dna B upstream C *A)%&*4 my/dna is no *AAAA)%&*

    1 e !a this the concatenation operator=

  • 8/19/2019 python for biologists

    33/227

    2 Chapter 2: #rinting an% manip&ating te

    e !an even 9oin m&tipe strings together in one go:

    upstream B *AAA*donstream B *%%%*my/dna B upstream C *A)%&* C donstream

    4 my/dna is no *AAAA)%&%%%*

    'ts important to reaie that the res&t of !on!atenating to strings together is

    itsef a string. "o its perfe!ty ?K to &se a !on!atenation insi%e a print statement:

    print(*ello* C * * C *orld*#

    4s e see in the rest of the boo-, &sing one too insi%e another is

  • 8/19/2019 python for biologists

    34/227

    2G Chapter 2: #rinting an% manip&ating te

    dna/length B len(*A%)&*#print(dna/length#

    $heres another interesting thing abo&t the len f&n!tion: the res&t or return

    value= is not a string, its a n&mber. $his is a very important i%ea so 'm going to

     rite it o&t in bo%: P*t+on treats strings and numbers differentl*, 

    e !an see that this is the !ase if e try to !on!atenate together a n&mber an% a

    string. Consi%er this short program hi!h !a!&ates the ength of a D;4 se

  • 8/19/2019 python for biologists

    35/227

    2H Chapter 2: #rinting an% manip&ating te

    7appiy, #ython has a b&itin so&tion 5 a f&n!tion !ae% str hi!h t&rns a

    n&mber1 into a string so that e !an print it. 7eres ho e !an mo%ify o&r program

    to &se it 5 've remove% the !omments from this version to ma-e it a bit more!ompa!t:

    my/dna B *A)%&%A%)*dna/length B len(my/dna#print(*)he length of the @

  • 8/19/2019 python for biologists

    36/227

    2I Chapter 2: #rinting an% manip&ating te

    #ython ang&age, it beongs to a parti!&ar type. $he metho% e are ta-ing abo&t

    here is !ae% loer, an% e say that it beongs to the string  type. 7eres ho e

    &se it:

    my/dna B *A)%&*4 print my/dna in loer caseprint(my/dna.loer(##

    ;oti!e ho &sing a metho% oo-s %ifferent to &sing a f&n!tion. hen e &se a

    f&n!tion i-e print or len, e rite the f&n!tion name first an% the arg&ments go

    in parentheses:

    print(*A)%&*#len(my/dna#

    hen e &se a metho%, e rite the name of the variabe first, fooe% by a

    perio%, then the name of the metho%, then the metho% arg&ments in parentheses.

    For the eampe ere oo-ing at here, loer, there is no arg&ment, so the opening

    an% !osing parentheses are right net to ea!h other.

    'ts important to noti!e that the loer metho% %oes not a!t&ay !hange the

     variabe instea% it ret&rns a !opy of the variabe in oer !ase. e !an prove that it

     or-s this ay by printing the variabe before an% after r&nning loer. 7eres the

    !o%e to %o so:

    my/dna B *A)%&*4 print the aria-le

    print(*-efore! * C my/dna#

    4 run the loer method and store the resultloercase/dna B my/dna.loer(#4 print the aria-le again

    print(*after! * C my/dna#

    an% heres the o&tp&t e get:

  • 8/19/2019 python for biologists

    37/227

    2@ Chapter 2: #rinting an% manip&ating te

    -efore! A)%&after! A)%&

    J&st i-e the len f&n!tion, in or%er to a!t&ay %o anything &sef& ith the loer 

    metho%, e nee% to store the res&t or print it right aay=.

    (e!a&se the loer metho% beongs to the string type, e !an ony &se it on

     variabes that are strings. 'f e try to &se it on a n&mber:

    my/num-er B len(*A%)&*#

    4 my/num-er is 'print(my/num-er.loer(##

     e i get an error that oo-s i-e this:

    Attri-ute9rror! int o-Dect has no attri-ute loer

    $he error message is a bit !rypti!, b&t hopef&y yo& !an grasp the meaning:

    something that is a n&mber an int, or integer= %oes not have a loer metho%.

    $his is a goo% eampe of the importan!e of types in #ython !o%e: e can onl* usemet+ods on t+e t*pe t+at t+e* belong to.

    (efore e move on, ets 9&st mention that there is another metho% that beongs to

    the string type !ae% upper 5 yo& !an probaby g&ess hat it %oes8

    Replacement

    7eres another eampe of a &sef& metho% that beongs to the string type:

    replace. replace is sighty %ifferent from anything eve seen before 5 it ta-esto arg&ments both strings= an% ret&rns a !opy of the variabe here a

    o!!&rren!es of the first string are repa!e% by the se!on% string. $hats

  • 8/19/2019 python for biologists

    38/227

    30 Chapter 2: #rinting an% manip&ating te

    protein B *lspad:tn*4 replace aline ith tyrosine

    print(protein.replace(**, *y*##

    4 e can replace more than one characterprint(protein.replace(*ls*, *ymt*##4 the original aria-le is not affected

    print(protein#

    4n% this is the o&tp&t e get:

    ylspad:tny

    ymtpad:tn

    lspad:tn

    e ta-e a oo- at more toos for !arrying o&t string repa!ement in !hapter H.

    xtracting part of a string 

    hat %o e %o if e have a ong string, b&t e ony ant a short portion of itE $his

    is -non as ta-ing a substring , an% it has its on notation in #ython. $o get a

    s&bstring, e foo the variabe name ith a pair of s

  • 8/19/2019 python for biologists

    39/227

    31 Chapter 2: #rinting an% manip&ating te

    palspad

    lspad:tn

    $here are to important things to noti!e here. Firsty, e a!t&ay start !o&nting

    from position ero, rather than one 5 in other or%s, position 3 is a!t&ay the

    fo&rth !hara!ter1. $his epains hy the first !hara!ter of the first ine of o&tp&t is

    p an% not s as yo& might thin-. "e!on%y, the positions are inclusive at the start

    b&t exclusive at the stop. 'n other or%s, the epression protein/012 gives &s

    everything starting at the thir% !hara!ter, an% stopping 9&st before the fifth

    !hara!ter i.e. !hara!ters three an% fo&r=.

    'f e 9&st give a singe n&mber in the s

  • 8/19/2019 python for biologists

    40/227

    32 Chapter 2: #rinting an% manip&ating te

    +ets &se o&r protein se

  • 8/19/2019 python for biologists

    41/227

    33 Chapter 2: #rinting an% manip&ating te

    emember that in #ython e start !o&nting from ero rather than one, so position

    0 is the first !hara!ter, position B is the fifth !hara!ter, et!. 4 !o&pe of eampes:

    protein B *lspad:tn*print(str(protein.find(p###print(str(protein.find(:t###print(str(protein.find(###

    4n% the o&tp&t:

    3

    "F1

    ;oti!e the behavio&r of fin% hen e as- it to o!ate a s&bstring that %oesnt eist

    5 e get ba!- the anser -3.

    (oth count an% find have a pretty serio&s imitation: yo& !an ony sear!h for

    ea!t s&bstrings. 'f yo& nee% to !o&nt the n&mber of o!!&rren!es of a variabe

    protein motif, or fin% the position of a variabe trans!ription fa!tor bin%ing site,

    they i not hep yo&. $he hoe of !hapter H is %evote% to toos that !an %o those-in%s of 9obs.

    ?f the toos eve %is!&sse% in this se!tion, three 5 replace, count an% find 5

    re

  • 8/19/2019 python for biologists

    42/227

    3B Chapter 2: #rinting an% manip&ating te

    "pitting &p a string into m&tipe bits

    4n obvio&s ho

    %o e spit a string e.g. a D;4 se $hats a !ommon9ob in bioogy, b&t &nfort&natey e !ant %o it yet &sing the toos from this !hapter

    e ta- abo&t vario&s %ifferent ays of spitting strings in !hapter B. ' mention it

    here 9&st to reass&re yo& that e i earn ho to %o it event&ay8

     (ecap

    e starte% this !hapter ta-ing abo&t strings an% ho to or- ith them, b&t aong

    the ay e ha% to ta-e a ot of %iversions, a of hi!h ere ne!essary to&n%erstan% ho the %ifferent string toos or-. $han-f&y, that means that eve

    !overe% most of the n&ts an% bots of the #ython ang&age, hi!h i ma-e f&t&re

    !hapters go m&!h more smoothy.

    eve earne% abo&t some genera feat&res of the #ython programming ang&age

    i-e

    • the %ifferen!e beteen functions, statements an% arguments

    • the importan!e of comments an% ho to &se them

    • ho to &se #ythons error messages to fi b&gs in o&r programs

    • ho to store values in variables

    • the ay that types or-, an% the importan!e of &n%erstan%ing them

    • the %ifferen!e beteen functions an% metho"s, an% ho to &se them both

    4n% eve en!o&ntere% some toos that are spe!ifi!ay for or-ing ith strings:• !on!atenation

    • %ifferent types of

  • 8/19/2019 python for biologists

    43/227

    3 Chapter 2: #rinting an% manip&ating te

    • !hanging the !ase of a string 

    • fin%ing an% !o&nting s&bstrings

    • repa!ing bits of a string ith something ne 

    • etra!ting bits of a string to ma-e a ne string 

    Many of the above topi!s i !rop &p again in f&t&re !hapters, an% i be

    %is!&sse% in more %etai, b&t yo& !an aays ret&rn to this !hapter if yo& ant to

    br&sh &p on the basi!s. $he eer!ises for this !hapter i ao yo& to pra!ti!e

    &sing the string manip&ation toos an% to be!ome famiiar ith them. $hey aso

    give yo& the !han!e to pra!ti!e b&i%er bigger programs by &sing the in%ivi%&a

    toos as b&i%ing bo!-s.

  • 8/19/2019 python for biologists

    44/227

    3G Chapter 2: #rinting an% manip&ating te

     !ercises

    Reminder: the %es!riptions of the eer!ises are %eiberatey terse an% may besomehat ambig&o&s 9&st i-e re

  • 8/19/2019 python for biologists

    45/227

    3H Chapter 2: #rinting an% manip&ating te

    Restriction fragment lengt+s

    7eres a short D;4 se

  • 8/19/2019 python for biologists

    46/227

    3I Chapter 2: #rinting an% manip&ating te

    plicing out introns) part t+ree

    *sing the %ata from part one, rite a program that i print o&t the origina

    genomi! D;4 se

  • 8/19/2019 python for biologists

    47/227

    3@ Chapter 2: #rinting an% manip&ating te

    &olutions

    Calculating 4T content

    $his eer!ise is going to invove a mit&re of strings an% n&mbers. +ets remin%

    o&rseves of the form&a for !a!&ating 4$ !ontent:

     AT content = A+T 

    length

    $here are three n&mbers e nee% to fig&re o&t: the n&mber of !s, the n&mber of "s

    an% the ength of the se

  • 8/19/2019 python for biologists

    48/227

    B0 Chapter 2: #rinting an% manip&ating te

    my/dna B *A&)%A)&%A))A&%)A)A%)A)))%&)A)&A)A&A)A)A)A)&%A)%&%))&A)*length B len(my/dna#

    a/count B my/dna.count(A#

    t/count B my/dna.count()#

    print(*length! * C str(length##

    print(*A count! * C str(a/count##print(*) count! * C str(t/count##

    +ets ta-e a oo- at the o&tp&t from this program:

    length! '

    A count! 1") count! 21

    $hat oo-s abo&t right, b&t ho %o e -no if its ea!ty rightE e !o&% go

    thro&gh the se

  • 8/19/2019 python for biologists

    49/227

    B1 Chapter 2: #rinting an% manip&ating te

    length! 'A count! 1

    ) count! 1

    )verything oo-s ?K 5 e !an probaby go ahea% an% r&n the !o%e on the ong

    se

  • 8/19/2019 python for biologists

    50/227

    B2 Chapter 2: #rinting an% manip&ating te

    $o fi it, a e nee% to %o is a%% some parentheses aro&n% the a%%ition, so that the

    ine be!omes:

    at/content B (a/count C t/count# length

    ;o e get the !orre!t o&tp&t for the test se

  • 8/19/2019 python for biologists

    51/227

    B3 Chapter 2: #rinting an% manip&ating te

    my/dna B *A&)%A)&%A))A&%)A)A%)A)))%&)A)&A)A&A)A)A)A)&%A)%&%))&A)*4 replace A ith )

    replacement1 B my/dna.replace(A, )#

    4 replace ) ith Areplacement2 B replacement1.replace(), A#4 replace & ith %

    replacement3 B replacement2.replace(&, %#4 replace % ith &replacement' B replacement3.replace(%, &#4 print the result of the final replacementprint(replacement'#

    hen e ta-e a oo- at the o&tp&t, hoever, something seems rong:

    A&A&AA&&AAAA&&AAAA&AAAAA&&AAA&AAA&AAAAAAAA&&AA&&&AA&AA

    e !an see 9&st by oo-ing at the origina se

  • 8/19/2019 python for biologists

    52/227

    BB Chapter 2: #rinting an% manip&ating te

    )&)%))&%))))&%))))%)))))%&)))&)))&))))))))&%))%&%))&))A&A%AA&%AAAA&%AAAA%AAAAA%&AAA&AAA&AAAAAAAA&%AA%&%AA&AA

    A%A%AA%%AAAA%%AAAA%AAAAA%%AAA%AAA%AAAAAAAA%%AA%%%AA%AA

    A&A&AA&&AAAA&&AAAA&AAAAA&&AAA&AAA&AAAAAAAA&&AA&&&AA&AA

    $he first repa!ement the res&t of hi!h is shon in the first ine of the o&tp&t=

     or-s fine 5 a the 4s have been repa!e% ith $s for eampe, oo- at the first

    !hara!ter 5 its 4 in the origina se

  • 8/19/2019 python for biologists

    53/227

    B Chapter 2: #rinting an% manip&ating te

    !ase. $hen, on!e a the repa!ements have been !arrie% o&t, e !an simpy !a

    upper an% !hange the hoe se

  • 8/19/2019 python for biologists

    54/227

    BG Chapter 2: #rinting an% manip&ating te

    Restriction fragment lengt+s

    +ets start this eer!ise by soving the probem man&ay. 'f e oo- thro&gh the

    D;4 se

  • 8/19/2019 python for biologists

    55/227

    BH Chapter 2: #rinting an% manip&ating te

    'f e ante% to r&n the same program &sing a %ifferent restri!tion enyme, e%

    have to !hange bot+ the string that e &se% in the find metho% !a, and the

    n&mber that e a%% in or%er to ta-e a!!o&nt of the !&t site.'ts orth noting that this program ass&mes that the D;4 se

  • 8/19/2019 python for biologists

    56/227

    BI Chapter 2: #rinting an% manip&ating te

    my/dna B*A)&%A)&%A)&%A)&%A&)%A&)A%)&A)A%&)A)%&A)%)A%&)A&)&%A)&%A)&%A)&%A)&%A)&%A)&

    %A)&%A)&%A)&A)%&)A)&A)&%A)&%A)A)&%A)%&A)&%A&)A&)A)*

    exon1 B my/dna$1!"3exon2 B my/dna$J1!10000print(exon1 C exon2#

    $he o&tp&t from this !o%e oo-s vag&ey right:

    )&%A)&%A)&%A)&%A&)%A&)A%)&A)A%&)A)%&A)%)A%&)A&)&%A)&%A)&%A)&%A)&A)&%A)&%A)A)&%A)%&A)&%A&)A&)A)

    b&t hen e oo- more !osey e !an see that something is not right. $he printe%

    !o%ing se

  • 8/19/2019 python for biologists

    57/227

    B@ Chapter 2: #rinting an% manip&ating te

    plicing out introns) part to

    $his is a straightforar% pie!e of n&mber!r&n!hing. $here are a !o&pe of ays to

    go abo&t it. e !o&% &se the eon startstop !oor%inates to !a!&ate the ength ofthe !o%ing portion of the se

  • 8/19/2019 python for biologists

    58/227

    0 Chapter 2: #rinting an% manip&ating te

    77.2377237723G

    atho&gh e probaby %ont reay re

  • 8/19/2019 python for biologists

    59/227

    1 Chapter 2: #rinting an% manip&ating te

    ?r e !o&% avoi% &sing variabes for the introns an% eons a together, an% %o

    everything in one big print statement:

    print(my/dna$0!"2 C my/dna$"2!J0.loer(# C my/dna$J0!10000#

    $his ast option is very !on!ise, b&t a bit har%er to rea% than the more verbose ay.

    4s the eer!ises in this boo- get onger, yo& noti!e that there are more an% more

    %ifferent ays to rite the !o%e 5 yo& may en% &p ith so&tions that oo- very

    %ifferent to the eampe so&tions. hen trying to !hoose beteen %ifferent ays

    to rite a program, aays favo&r the so&tion that is !earest in intent an% easiest

    to rea%.

  • 8/19/2019 python for biologists

    60/227

    2 Chapter 3: ea%ing an% riting fie

    3: Reading and riting files

    Why are we so intereste" in working with files? 

    4s e start this !hapter, e fin% o&rseves on!e again %oing things in a sighty

    %ifferent or%er to most programming boo-s. $he ma9ority of intro%&!tory

    programming boo-s ont !onsi%er or-ing ith eterna fies &nti m&!h f&rther

    aong, so hy are e intro%&!ing it noE

    $he anser, as as the !ase in the ast !hapter, ies in the parti!&ar 9obs that e

     ant to &se #ython for. $he %ata that e as bioogists or- ith is store% in fies, soif ere going to rite &sef& programs e nee% a ay to get the %ata o&t of fies

    an% into o&r programs an% vice versa=. 4s yo& ere going thro&gh the eer!ises in

    the previo&s !hapter, it may have o!!&rre% to yo& that !opying an% pasting a D;4

    se

  • 8/19/2019 python for biologists

    61/227

    3 Chapter 3: ea%ing an% riting fie

    4nother reason for o&r interest in fie inp&t/o&tp&t is the nee% for o&r #ython

    programs to or- as part of a pipeine or or- fo invoving other, eisting toos.

    hen it !omes to &sing #ython in the rea or%, e often ant #ython to eithera!!ept %ata from, or provi%e %ata to, another program. ?ften the easiest ay to %o

    this is to have #ython rea%, or rite, fies in a format that the other program

    area%y &n%erstan%s.

     (ea"ing te!t from a file

    Firsty, a

  • 8/19/2019 python for biologists

    62/227

    B Chapter 3: ea%ing an% riting fie

    • !ompresse% fies e.g. X'# fies=

    'f yo&re not s&re hether a parti!&ar fie is tet or binary, theres a very simpe

     ay to te 5 9&st open it &p in a tet e%itor. 'f the fie %ispays itho&t any probemthen its tet regar%ess of hether yo& !an ma-e sense of it or not=. 'f yo& get an

    error or a arning from yo&r tet e%itor, or the fie %ispays as a !oe!tion of

    in%e!ipherabe !hara!ters, then its binary.

    $he eampes an% eer!ises in this !hapter are a itte %ifferent from those in the

    previo&s one, be!a&se they rey on the eisten!e of the fies that e are going to

    manip&ate. 'f yo& ant to try r&nning the eampes in this !hapter, yo& nee% to

    ma-e s&re that there is a fie in yo&r or-ing %ire!tory !ae% "na=t!t  hi!h has asinge ine !ontaining a D;4 se

  • 8/19/2019 python for biologists

    63/227

    Chapter 3: ea%ing an% riting fie

    $he ay that e &se fie ob9e!ts is a bit %ifferent to strings an% n&mbers as e. 'f

     yo& gan!e ba!- at the eampes from the previo&s !hapter yo& see that most of

    the time hen e ant to &se a variabe !ontaining a string or n&mber e 9&st &sethe variabe name:

    my/string B a-cdefgprint(my/string#my/num-er B '2print(my/num-er C 1#

    'n !ontrast, hen ere or-ing ith fie ob9e!ts most of o&r intera!tion i be

    thro&gh metho"s. $his stye of programming i seen &n&s&a at first, b&t as esee in this !hapter, the fie type has a e tho&ghto&t set of metho%s hi!h et &s

    %o ots of &sef& things.

    $he first thing e nee% to be abe to %o is to rea% the !ontents of the fie. $he fie

    type has a read metho% hi!h %oes this. 't %oesnt ta-e any arg&ments, an% the

    ret&rn va&e is a string, hi!h e !an store in a variabe. ?n!e eve rea% the fie

    !ontents into a variabe, e !an treat them 9&st i-e any other string 5 for eampe,

     e !an print them:

    my/file B open(*dna.txt*#

    file/contents B my/file.read(#print(file/contents#

     /iles0 contents an" file names

    hen earning to or- ith fies its very easy to get !onf&se% beteen a file ob@ect ,a file name, an% the contents of a fie. $a-e a oo- at the fooing bit of !o%e:

  • 8/19/2019 python for biologists

    64/227

    G Chapter 3: ea%ing an% riting fie

    my/file/name B *dna.txt*my/file B open(my/file/name#

    my/file/contents B my/file.read(#

    hats going on hereE ?n ine 1, e store the string "na=t!t  in the variabe

    my_file_name. ?n ine 2, e &se the variabe my_file_name as the arg&ment

    to the open f&n!tion, an% store the res&ting fie ob9e!t in the variabe my_file.

    ?n ine 3, e !a the read metho% on the variabe my_file, an% store the

    res&ting string in the variabe my_file_contents.

    $he important thing to &n%erstan% abo&t this !o%e is that there are three separate

     variabes hi!h have %ifferent types an% hi!h are storing three very %ifferentthings. my_file_name is a string, an% it stores the name of a fie on %is-.

    my_file is a fie ob9e!t, an% it represents the fie itsef. my_file_contents is a

    string, an% it stores the tet that is in the fie.

    emember that variabe names are arbitrary 5 the !omp&ter %oesnt !are hat yo&

    !a yo&r variabes. "o this pie!e of !o%e is ea!ty the same as the previo&s

    eampe:

    apple B *dna.txt*-anana B open(apple#grape B -anana.read(#

    e!ept it is har%er to rea%8 'n !ontrast, the fie name "na=t!t = is not arbitrary 5 it

    m&st !orrespon% to the name of a fie on the har% %rive of yo&r !omp&ter.

    4 !ommon error is to try to &se the read metho% on the rong thing. e!a that

    read is a metho% that ony or-s on fie ob9e!ts. 'f e try to &se the read metho%on the fie name:

    my/file/name B *dna.txt*my/contents B my/file/name.read(#

    1

    2

    3

  • 8/19/2019 python for biologists

    65/227

    H Chapter 3: ea%ing an% riting fie

     e get an !ttri$uteError 5 #ython i !ompain that strings %ont have a

    read metho%1:

    Attri-ute9rror! str o-Dect has no attri-ute read

    4nother !ommon error is to &se the file ob@ect hen e meant to &se the file

    contents. 'f e try to print the fie ob9e!t:

    my/file/name B *dna.txt*my/file B open(my/file/name#print(my/file#

     e ont get an error, b&t e get an o%%oo-ing ine of o&tp&t:

    ;open file dna.txt, mode r at 0x7fcff77G'-0+

    e ont %is!&ss the meaning of this ine no: 9&st remember that if yo& try to

    print the !ontents of a fie b&t instea% yo& get some o&tp&t that oo-s i-e the

    above, yo& have amost %efinitey printe% the fie ob9e!t rather than the fie

    !ontents.

     ealing with newlines

    +ets ta-e a oo- at the o&tp&t e get hen e try to print some information from a

    fie. e &se the "na=t!t  fie from the chapter)  eer!ises fo%er. $his fie !ontains

    a singe ine ith a short D;4 se

  • 8/19/2019 python for biologists

    66/227

    I Chapter 3: ea%ing an% riting fie

    from this !hapter, an% the materia e sa in the previo&s !hapter, e get the

    fooing !o%e:

    4 open the filemy/file B open(*dna.txt*#4 read the contentsmy/dna B my/file.read(#

    4 calculate the lengthdna/length B len(my/dna#4 print the outputprint(*seuence is * C my/dna C * and length is * C str(dna/length##

    hen e oo- at the o&tp&t, e !an see that the program is or-ing amostperfe!ty 5 b&t there is something strange: the o&tp&t has been spit over to ines

    seuence is A&)%)A&%)%&A&)%A)&

    and length is 1J

    $he epanation is simpe on!e yo& -no it: #ython has in!&%e% the ne ine

    !hara!ter at the en% of the "na=t!t  fie as part of the !ontents. 'n other or%s, the

     variabe my_dna has a ne ine !hara!ter at the en% of it. 'f e !o&% vie themy_dna  variabe %ire!ty 1, e o&% see that it oo-s i-e this:

    A&)%)A&%)%&A&)%A)&n

    $he so&tion is aso simpe. (e!a&se this is s&!h a !ommon probem, strings have a

    metho% for removing ne ines from the en% of them. $he metho% is !ae%

    rstrip, an% it ta-es one string arg&ment hi!h is the !hara!ter that yo& ant to

    remove. 'n this !ase, e ant to remove the neine !hara!ter \n=. 7eres amo%ifie% version of the !o%e 5 note that the arg&ment to rstrip is itsef a string

    so nee%s to be en!ose% in

  • 8/19/2019 python for biologists

    67/227

    @ Chapter 3: ea%ing an% riting fie

    my/file B open(*dna.txt*#my/file/contents B my/file.read(#

    4 remoe the neline from the end of the file contents

    my/dna B my/file/contents.rstrip(*n*#dna/length B len(my/dna#print(*seuence is * C my/dna C * and length is * C str(dna/length##

    an% no the o&tp&t oo-s 9&st as e epe!te%:

    seuence is A&)%)A&%)%&A&)%A)& and length is 1G

    'n the !o%e above, e first rea% the fie !ontents an% then remove% the neine, in

    to separate steps:

    my/file/contents B my/file.read(#my/dna B my/file/contents.rstrip(*n*#

    b&t its more !ommon to rea% the !ontents an% remove the neine a in one go,

    i-e this:

    my/dna B my/file.read(#.rstrip(*n*#

    $his is a bit tri!-y to rea% at first as e are &sing to %ifferent metho%s read an%

    rstrip= in the same statement. $he -ey is to rea% it from eft to right 5 e ta-e th

    my_file variabe an% &se the read metho% on it, then e ta-e the o&tp&t of that

    metho% hi!h e -no is a string= an% &se the rstrip metho% on it. $he res&t

    of the rstrip metho% is then store% in the my_dna variabe.

    'f yo& fin% it %iffi!&t rite the hoe thing as one statement i-e this, 9&st brea- it&p an% %o the to things separatey 5 yo&r programs i r&n 9&st as e.

  • 8/19/2019 python for biologists

    68/227

    G0 Chapter 3: ea%ing an% riting fie

     issing files

    hat happens if e try to rea% a fie that %oesnt eistE

    my/file B open(*nonexistent.txt*#

    e get a ne type of error that eve not seen before:

    L>9rror! $9rrno 2

  • 8/19/2019 python for biologists

    69/227

    G1 Chapter 3: ea%ing an% riting fie

    Most importanty, termina o&tp&t vanishes hen yo& !ose yo&r termina program

    For sma programs i-e the eampes in this boo-, thats not a probem 5 if yo&

     ant to see the o&tp&t again yo& !an 9&st rer&n the program. 'f yo& have aprogram that rer> for

    rea%ing.

    $he %ifferen!e beteen >> an% >a> is s&bte, b&t important. 'f e open a fie that

    area%y eists &sing the mo%e >>, then e i overrite the !&rrent !ontents ith

     hatever %ata e rite to it. 'f e open an eisting fie ith the mo%e >a>, it i

    a%% ne %ata onto the en% of the fie, b&t i not remove any eisting !ontent. 'f

    there %oesnt area%y eist a fie ith the spe!ifie% name, then >> an% >a> behave

    i%enti!ay 5 they i both !reate a ne fie to ho% the o&tp&t.

    L&ite a ot of #ython f&n!tions an% metho%s have these optiona arg&ments. For

    the p&rposes of this boo-, e i ony mention them hen they are %ire!ty

    reevant to hat ere %oing. 'f yo& ant to see a the optiona arg&ments for a

    parti!&ar metho% or f&n!tion, the best pa!e to oo- is the offi!ia #ython

    %o!&mentation 5 see !hapter 1 for %etais.

    1 e !a this the mo"e of the fie.

    2 $hese are the most !ommony&se% options 5 there are a fe others.

  • 8/19/2019 python for biologists

    70/227

    G2 Chapter 3: ea%ing an% riting fie

    ?n!e eve opene% a fie for riting, e !an &se the fie rite metho% to rite

    some tet to it. rite or-s a ot i-e print 5 it ta-es a singe string arg&ment

    b&t instea% of printing the string to the s!reen it rites it to the fie.7eres ho e &se open ith a se!on% arg&ment to open a fie an% rite a singe

    ine of tet to it:

    my/file B open(*out.txt*, **#my/file.rite(*ello orld*#

    (e!a&se the o&tp&t is being ritten to the fie in this eampe, yo& ont see any

    o&tp&t on the s!reen if yo& r&n it. $o !he!- that the !o%e has or-e%, yo& have tor&n it, then open &p the fie out=t!t  in yo&r tet e%itor an% !he!- that its !ontents

    are hat yo& epe!t1. 

    emember that ith rite, 9&st i-e ith print, e !an &se an*  string as the

    arg&ment. $his aso means that e !an &se any metho% or f&n!tion that returns a

    string. $he fooing are a perfe!ty ?K:

    4 rite *a-cdef*my/file.rite(*a-c* C *def*#4 rite *G*

    my/file.rite(str(len(A%)%&)A%###4 rite *))%&*my/file.rite(*A)%&*.replace(A, )##4 rite *atgc*

    my/file.rite(*A)%&*.loer(##4 rite contents of my/aria-lemy/file.rite(my/aria-le#

    1 .tt is the stan%ar% fie name etension for a pain tet fie. +ater in this boo-, hen e generate o&tp&t

    fies ith a parti!&ar format, e &se %ifferent fie name etensions.

  • 8/19/2019 python for biologists

    71/227

    G3 Chapter 3: ea%ing an% riting fie

    4losing files

    $heres one more important fie metho% to oo- at before e finish this !hapter 5

    close. *ns&rprisingy, this is the opposite of open b&t note that its a metho" ,

     hereas open is a function=. e sho&% !a close after ere %one rea%ing or

     riting to a fie 5 e ont go into the %etais here, b&t its a goo% habit to get into

    as it avoi%s some types of b&gs that !an be tri!-y to tra!- %on1. close is an

    &n&s&a metho% as it ta-es no arg&ments so its !ae% ith an empty pair of

    parentheses= an% %oesnt ret&rn any &sef& va&e:

    my/file B open(*out.txt*, **#my/file.rite(*ello orld*#

    4 remem-er to close the filemy/file.close(#

     Paths an" fol"ers

    "o far, e have ony %eat ith opening fies in the same fo%er that e are r&nning

    o&r program. hat if e ant to open a fie from a %ifferent part of the fie systemE

    $he open f&n!tion is

  • 8/19/2019 python for biologists

    72/227

    GB Chapter 3: ea%ing an% riting fie

    my/file B open(r*c!indos@es:topmyfoldermyfile.txt*#

    an% if yo&re on a Ma!, i-e this:

    my/file B open(*Msersmartin@es:topmyfoldermyfile.txt*#

     (ecap

    eve ta-en a hoe !hapter to intro%&!e the vario&s ays of rea%ing an% riting

    to fies, be!a&se its s&!h an important part of b&i%ing programs that are &sef& in

    bioogy. eve seen ho or-ing ith fie !ontents is aays a tostep pro!ess 5

     e m&st open a fie before rea%ing or riting 5 an% oo-e% at severa !ommon

    pitfas. e ret&rn to the theme of fie manip&ation in ater !hapters here e

    a%%ress some of the short!omings of the te!hni

  • 8/19/2019 python for biologists

    73/227

    G Chapter 3: ea%ing an% riting fie

     !ercises

    plitting genomic '.4

    +oo- in the chapter)  fo%er for a fie !ae% genomic"na=t!t 5 it !ontains the same

    pie!e of genomi! D;4 that e ere &sing in the fina eer!ise from !hapter 2. rite

    a program that i spit the genomi! D;4 into !o%ing an% non!o%ing parts, an%

     rite these se

  • 8/19/2019 python for biologists

    74/227

    GG Chapter 3: ea%ing an% riting fie

    rite a program that i !reate a F4"$4 fie for the fooing three se

  • 8/19/2019 python for biologists

    75/227

    GH Chapter 3: ea%ing an% riting fie

    &olutions

    plitting genomic '.4

    e have a hea%start on this probem, be!a&se e have area%y ta!-e% a simiar

    probem in the previo&s !hapter. +ets remin% o&rseves of the so&tion e en%e%

    &p ith for that eer!ise:

    my/dna B*A)&%A)&%A)&%A)&%A&)%A&)A%)&A)A%&)A)%&A)%)A%&)A&)&%A)&%A)&%A)&%A)&%A)&%A)&

    %A)&%A)&%A)&A)%&)A)&A)&%A)&%A)A)&%A)%&A)&%A&)A&)A)*exon1 B my/dna$0!"2intron B my/dna$"2!J0exon2 B my/dna$J0!10000print(exon1 C intron.loer(# C exon2#

    hat !hanges %o e nee% to ma-eE Firsty, e nee% to rea% the D;4 se

  • 8/19/2019 python for biologists

    76/227

    GI Chapter 3: ea%ing an% riting fie

    +ets p&t it a together, ith some ban- ines to separate o&t the %ifferent parts of

    the program:

    4 open the file and read its contentsdna/file B open(*genomic/dna.txt*#my/dna B dna/file.read(#

    4 extract the different -its of @

  • 8/19/2019 python for biologists

    77/227

    G@ Chapter 3: ea%ing an% riting fie

    5 that i ma-e it easier to see the o&tp&t right aay. ?n!e eve got it or-ing,

     e sit!h over to fie o&tp&t. 7eres a fe ines hi!h i print %ata to the

    s!reen:

    print(header/1#print(se/1#print(header/2#

    print(se/2#print(header/3#print(se/3#

    an% heres hat the o&tp&t oo-s i-e:

    AN&123A)&%)A&%A)&%A)&%A)&%&)A%A&%)A)&%@9'"

    actgatcgacgatcgatcgatcacgact@9'"actgatcgacgatcgatcgatcacgact

    ;ot far off 5 the ines are in the right or%er, b&t e forgot to in!&%e the greater

    than symbo at the start of the hea%er. 4so, e %ont reay nee% to print the

    hea%er an% the se

  • 8/19/2019 python for biologists

    78/227

    H0 Chapter 3: ea%ing an% riting fie

    +AN&123A)&%)A&%A)&%A)&%A)&%&)A%A&%)A)&%

    +@9'"

    actgatcgacgatcgatcgatcacgact+LH7GJA&)%A&FA&)%)FFA&)%)AFFFF&A)%)%

    ;et, ets ta!-e the probems ith the se

  • 8/19/2019 python for biologists

    79/227

    H1 Chapter 3: ea%ing an% riting fie

    output B open(*seuences.fasta*, **#output.rite(+ C header/1 C n C se/1#

    output.rite(+ C header/2 C n C se/2.upper(##

    output.rite(+ C header/3 C n C se/3.replace(F, ##

    4fter ma-ing these !hanges the !o%e %oesnt pro%&!e any o&tp&t on the s!reen, so

    to see hats happene% e nee% to ta-e a oo- at the se;uences=fasta fie:

    +AN&123

    A)&%)A&%A)&%A)&%A)&%&)A%A&%)A)&%+@9'"A&)%A)&%A&%A)&%A)&%A)&A&%A&)+LH7GJA&)%A&A&)%)A&)%)A&A)%)%

    $his %oesnt oo- right 5 the se!on% an% thir% ines have been 9oine% together, as

    have the fo&rth an% fifth. hat has happene%E

    't oo-s i-e eve &n!overe% a %ifferen!e beteen the print f&n!tion an% the

    rite metho%. print a&tomati!ay p&ts a ne ine at the en% of the string,

     hereas rite %oesnt. $his means eve got to be !aref& hen sit!hing

    beteen them8 $he fi is .

  • 8/19/2019 python for biologists

    80/227

    H2 Chapter 3: ea%ing an% riting fie

    7eres the fina !o%e, in!&%ing the variabe %efinition at the beginning, ith ban-

    ines an% !omments:

    4 set the alues of all the header aria-lesheader/1 B *!5C36/header_6 @ (E7819header_/ @ :;?

    B set the alues of all the seAuence aria$lesseA_3 @ !"C#"!C#!"C#!"C#!"C#C"!#!C#"!"C#seA_6 @ actgatcgacgatcgatcgatcacgactseA_/ @ !C"#!C-!C"#"D!C"#"!----C!"#"#

    B make a ne file to hold the outputoutput @ open+seAuences.fasta% ,

    B rite the header and seAuence for seA3output.rite+)) F header_3 F )\n) F seA_3 F )\n),

    B rite the header and uppercase seAuences for seA6output.rite+)) F header_6 F )\n) F seA_6.upper+, F )\n),

    B rite the header and seAuence for seA/ ith hyphens remoedoutput.rite+)) F header_/ F )\n) F seA_/.replace+)-)% )), F )\n),

    %riting multiple (4T4 files

    e !an sove this probem ith a sight mo%ifi!ation of o&r so&tion to the previo&

    eer!ise. e nee% to !reate three ne fies to ho% the o&tp&t, an% e !onstr&!t

    the name of ea!h fie by &sing string !on!atenation:

    output/1 B open(header/1 C *.fasta*, **#

    output/2 B open(header/2 C *.fasta*, **#output/3 B open(header/3 C *.fasta*, **#

    emember, the first arg&ment to open is a string, so its fine to &se a !on!atenation

    be!a&se e -no that the res&t of !on!atenating to strings is aso a string.

  • 8/19/2019 python for biologists

    81/227

    H3 Chapter 3: ea%ing an% riting fie

    e aso !hange the rite statements so that e have one for ea!h of the o&tp&t

    fies. e nee% to be !aref& ith the n&mber here in or%er to ma-e s&re that e get

    the right se

  • 8/19/2019 python for biologists

    82/227

    HB Chapter B: +ists an% oop

    ": #ists and loops

    Why "o we nee" lists an" loops? 

    $hin- ba!- over the eer!ises that eve seen in the previo&s to !hapters 5 theyve

    a invove% %eaing ith one bit of information at a time. 'n !hapter 2, e &se%

    string manip&ation toos to pro!ess singe se

  • 8/19/2019 python for biologists

    83/227

    H Chapter B: +ists an% oop

    $he imitations of this approa!h be!ame !ear

  • 8/19/2019 python for biologists

    84/227

    HG Chapter B: +ists an% oop

    4reating lists an" retrieving elements

    $o ma-e a ne ist, e p&t severa strings or n&mbers 1 insi%e s

  • 8/19/2019 python for biologists

    85/227

    HH Chapter B: +ists an% oop

    hat if e ant to get more than one eement from a istE e !an give a start an%

    stop position, separate% by a !oon, to spe!ify a range of eements:

    ran:s B $*:ingdom*,*phylum*, *class*, *order*, *family*loer/ran:s B ran:s$2!4 loer ran:s are class, order and family

    Does this oo- famiiarE 'ts the ea!t same notation that e &se% to get s&bstrings

    ba!- in !hapter 2, an% it or-s in ea!ty the same ay 5 n&mbers are inclusive at

    the start an% exclusive at the en%. $he fa!t that e &se the same notation for

    strings an% ists hints at a %eeper reationship beteen the to types. 'n fa!t, hat

     e ere %oing hen etra!ting s&bstrings in !hapter 2 as treating a string as

    t+oug+ it ere a list of c+aracters. $his i%ea 5 that e !an treat a variabe as

    tho&gh it ere a ist hen its not 5 is a poerf& one in #ython an% e !ome ba!-

    to it ater in this !hapter.

    Working with list elements

    $o a%% another eement onto the en% of an eisting ist, e !an &se the append 

    metho%:

    apes B $*omo sapiens*, *Pan troglodytes*, *%orilla gorilla*apes.append(*Pan paniscus*#

    append is an interesting metho% be!a&se it a!t&ay !hanges the variabe on hi!h

    its &se% 5 in the above eampe, the apes ist goes from having three eements to

    having fo&r. e !an get the ength of a ist by &sing the len f&n!tion, 9&st i-e e

    %i% for strings:

  • 8/19/2019 python for biologists

    86/227

    HI Chapter B: +ists an% oop

    apes B $*omo sapiens*, *Pan troglodytes*, *%orilla gorilla*print(*)here are * C str(len(apes## C * apes*#

    apes.append(*Pan paniscus*#

    print(*

  • 8/19/2019 python for biologists

    87/227

    H@ Chapter B: +ists an% oop

    the ist. 'f e ant to print o&t a ist to see ho this or-s, e nee% to &se% str 

    9&st as e %i% hen printing o&t n&mbers=:

    ran:s B $*:ingdom*,*phylum*, *class*, *order*, *family*print(*at the start ! * C str(ran:s##ran:s.reerse(#print(*after reersing ! * C str(ran:s##

    ran:s.sort(#print(*after sorting ! * C str(ran:s##

    'f e ta-e a oo- at the o&tp&t, e !an see ho the or%er of the eements in the ist

    is !hange% by these to metho%s:

    at the start ! $:ingdom, phylum, class, order, familyafter reersing ! $family, order, class, phylum, :ingdomafter sorting ! $class, family, :ingdom, order, phylum

    (y %efa&t, #ython sorts strings in aphabeti!a or%er an% n&mbers in as!en%ing

    n&meri!a or%er1.

    Writing a loop

    'magine e ante% to ta-e o&r ist of apes:

    apes B $*omo sapiens*, *Pan troglodytes*, *%orilla gorilla*

    an% print o&t ea!h eement on a separate ine, i-e this:

    omo sapiens is an apePan troglodytes is an ape%orilla gorilla is an ape

    1 e !an sort in other ays too, b&t thats beyon% the s!ope of this boo-

  • 8/19/2019 python for biologists

    88/227

    I0 Chapter B: +ists an% oop

    ?ne ay to %o it o&% be to 9&st print ea!h eement separatey:

    print(apes$0 C * is an ape*#print(apes$1 C * is an ape*#print(apes$2 C * is an ape*#

    b&t this is very repetitive an% reies on &s -noing the n&mber of eements in the

    ist. hat e nee% is a ay to say something aong the ines of > for each element in

    the list of apes0 print out the element0 followe" by the wor"s 7 is an ape7>. #ythons oop

    synta aos &s to epress those instr&!tions i-e this:

    for ape in apes!print(ape C * is an ape*#

    +ets ta-e a moment to oo- at the %ifferent parts of this oop. e start by riting

    for x in y, here y is the name of the ist e ant to pro!ess an% x is the name

     e ant to &se for the !&rrent eement ea!h time ro&n% the oop.

    x is 9&st a variabe name so it foos a the r&es that eve area%y earne% abo&t

     variabe names=, b&t it behaves sighty %ifferenty to a the other variabes eve

    seen so far. 'n a previo&s eampes, e !reate a variabe an% store something in it,

    an% then the va&e of that variabe %oesnt !hange &ness e !hange it o&rseves. 'n

    !ontrast, hen e !reate a variabe to be &se% in a oop, e %ont set its va&e 5 the

     va&e of the variabe i be a&tomati!ay set to ea!h eement of the ist in t&rn,

    an% it i be %ifferent ea!h time ro&n% the oop.

    'mportanty, the oop variabe x ony eists insi%e the oop 5 it gets !reate% at the

    start of ea!h oop iteration, an% %isappears at the en%. $his means that on!e the

    oop has finishe% r&nning for the ast time, that variabe is gone forever. hen a

     variabe is restri!te% to a bo!- of !o%e i-e this, e !a it the variabes scope 5 e

     i see severa more eampes ater in the boo-.

  • 8/19/2019 python for biologists

    89/227

    I1 Chapter B: +ists an% oop

    $his first ine of the oop en%s ith a !oon, an% a the s&bse

  • 8/19/2019 python for biologists

    90/227

    I2 Chapter B: +ists an% oop

    omo sapiens is an ape. Lts name starts ith Lts name has 12 letters

    Pan troglodytes is an ape. Lts name starts ith P

    Lts name has 1 letters%orilla gorilla is an ape. Lts name starts ith %Lts name has 1 letters

    hy is the above approa!h better than printing o&t these si ines in si separate

    statementsE e, for one thing, theres m&!h ess re%&n%an!y 5 here e ony

    nee%e% to rite to print statements. $his aso means that if e nee% to ma-e a

    !hange to the !o%e, e ony have to ma-e it on!e rather than three separate times.

    4nother benefit of &sing a oop here is that if e ant to a%% some eements to theist, e %ont have to to&!h the oop !o%e at a. Conse

  • 8/19/2019 python for biologists

    91/227

    I3 Chapter B: +ists an% oop

    Lndentation9rror! unindent does not match any outer indentation leel

    hen yo& en!o&nter an ;ndentationError, go ba!- to yo&r !o%e an% %o&be!he!- that a the ines in the bo!- mat!h &p. 4so %o&be!he!- that yo& are &sing

    either tabs or spa!es for in%entation, not bot+. $he easiest ay to %o this, as

    mentione% in !hapter 1, is to enabe tab emulation in yo&r tet e%itor.

    ,sing a string as a list 

    eve area%y seen ho a string !an preten% to be a ist 5 e !an &se ist in%e

    notation to get in%ivi%&a !hara!ters or s&bstrings from insi%e a string. Can e aso&se oop notation to pro!ess a string as tho&gh it ere a istE es 5 if e rite a

    oop statement ith a string in the position here e% normay fin% a ist, #ython

    treats eac+ c+aracter in the string as a separate eement. $his aos &s to very

    easiy pro!ess a string one !hara!ter at a time:

    name B *martin*for character in name!

    print(*one character is * C character#

    'n this !ase, ere 9&st printing ea!h in%ivi%&a !hara!ter:

    one character is mone character is aone character is r

    one character is tone character is ione character is n

    $he pro!ess of repeating a set of instr&!tions for ea!h eement of a ist or

    !hara!ter in a string= is !ae% iteration, an% e often ta- abo&t iterating over a ist

    or string.

  • 8/19/2019 python for biologists

    92/227

    IB Chapter B: +ists an% oop

    &plitting a string to make a list 

    "o far in this !hapter, a o&r ists have been ritten man&ay. 7oever, there are

    penty of f&n!tions an% metho%s in #ython that pro%&!e ists as their o&tp&t. ?ne

    s&!h metho% that is parti!&ary interesting to bioogists is the split metho%

     hi!h or-s on strings. split ta-es a singe arg&ment, !ae% the "elimiter , an%

    spits the origina string herever it sees the %eimiter, pro%&!ing a ist. 7eres an

    eampe:

    names B *melanogaster,simulans,ya:u-a,ananassae*species B names.split(*,*#

    print(str(species##

    e !an see from the o&tp&t that the string has been spit herever there as a

    !omma eaving &s ith a ist of strings:

     $melanogaster, simulans, ya:u-a, ananassae

    ?f !o&rse, on!e eve !reate% a ist in this ay e !an iterate over it &sing a oop,

    9&st i-e any other ist.

     terating over lines in a file

    4nother very &sef& thing that e !an iterate over is a fie. J&st as a string !an

    preten% to be a ist for the p&rposes of ooping, a fie ob9e!t !an %o the same tri!-1.

    hen e treat a string as a ist, ea!h !hara!ter be!omes an in%ivi%&a eement, b&t

     hen e treat a fie as a ist, ea!h line be!omes an in%ivi%&a eement. $his ma-es

    pro!essing a fie inebyine very easy:

    1 'f yo&re intereste% in ho this >preten%ing> a!t&ay or-s, oo- &p the #ython %o!&mentation for iterator

    5 b&t be prepare% to %o

  • 8/19/2019 python for biologists

    93/227

    I Chapter B: +ists an% oop

    file B open(*some/input.txt*#for line in file!

    4 do something ith the line

    4

  • 8/19/2019 python for biologists

    94/227

    IG Chapter B: +ists an% oop

    that o&% nee% to !hange is the stop position in the s&bstring. (&t hat are e

    going to iterate overE e !ant 9&st iterate over the protein string, be!a&se that i

    give &s in%ivi%&a resi%&es, hi!h is not hat e ant. e !an man&ay assembe aist of stop positions, an% oop over that:

    stop/positions B $3,',,",7,G,J,10for stop in stop/positions!

    su-string B protein$0!stopprint(su-string#

    b&t this seems !&mbersome, an% ony or-s if e -no the ength of the protein

    se

  • 8/19/2019 python for biologists

    95/227

    IH Chapter B: +ists an% oop

    ith to n&mbers, range i !o&nt &p from the first n&mber in!&sive1= to the

    se!on% e!&sive=:

    for num-er in range(3, G#!print(num-er#

    3'

    "7

    ith three n&mbers, range i !o&nt &p from the first to the se!on% ith the step

    sie given by the thir%:

    for num-er in range(2, 1', '#!print(num-er#

    2

    "10

     (ecap

    'n this !hapter eve seen severa toos that or- together to ao o&r programs to

    %ea eeganty ith m&tipe pie!es of %ata. +ists et &s store many eements in a

    singe variabe, an% oops et &s pro!ess those eements one by one. 'n earning

    abo&t oops, eve aso been intro%&!e% to the bo!- synta an% the importan!e of

    in%entation in #ython.

    1 $he r&es for ranges are the same as for array notation 5 in!&sive on the o en%, e!&sive on the high en

    5 so yo& ony have to memorie them on!e8

  • 8/19/2019 python for biologists

    96/227

    II Chapter B: +ists an% oop

    eve aso seen severa &sef& ays in hi!h e !an &se the notation eve earne%

    for or-ing ith ists ith other types of %ata. Depen%ing on the !ir!&mstan!es, e

    !an &se strings, files, an% ranges as if they ere ists. $his is a very hepf& feat&re of#ython, be!a&se on!e eve be!ome famiiar ith the synta for or-ing ith ists,

     e !an &se it in many %ifferent pa!e. +earning abo&t these toos has aso hepe% &s

    ma-e sense of some interesting behavio&r that e observe% in earier !hapters.

  • 8/19/2019 python for biologists

    97/227

    I@ Chapter B: +ists an% oop

     !ercises

    .ote: a the fies mentione% in these eer!ises !an be fo&n% in the chapter* fo%eof the eer!ises %onoa%.

    Processing '.4 in a file

    $he fie input=t!t  !ontains a n&mber of D;4 se

  • 8/19/2019 python for biologists

    98/227

    @0 Chapter B: +ists an% oop

    &olutions

    Processing '.4 in a file

    $his seems a bit more !ompi!ate% than previo&s eer!ises 5 e are being as-e% to

     rite a program that %oes to things at on!e8 5 so ets ta!-e it one step at a time.

    First, e rite a program that simpy rea%s ea!h se

  • 8/19/2019 python for biologists

    99/227

    @1 Chapter B: +ists an% oop

    7eres hat the !o%e oo-s i-e ith the s&bstring part a%%e%:

    file B open(*input.txt*#for dna in file!  last/character/position B len(dna#

      trimmed/dna B dna$1'!last/character/positionprint(trimmed/dna#

    4s before, e are simpy printing the trimme% D;4 se

  • 8/19/2019 python for biologists

    100/227

    @2 Chapter B: +ists an% oop

    ?pening &p the trimme"=t!t  fie, e !an see that the res&t oo-s goo%. 't %i%nt

    matter that e never remove% the neines, be!a&se they appear in the !orre!t

    pa!e in the o&tp&t fie anyay:

    )&%A)&%A)&%A)&%A)&%A)&%A)&%A)&%A)&%A)&%A)&A&)%A)&%A)&%A)&%A)&%A)&%A)%&)A)&%)&%)A)&%A)&A&%A)&)A)&%)A&%)A)%&A)A)&%A)A)&%A)&%)A%)&A&)A)&%A)%A)&)A%&)A&%A)&%)A%&)%)AA&)A%&)A%)&)&%A)%&A)%A)&A%&))A%&)%A)%A)%&)A)%&A

    ;o the fina step 5 printing the engths to the s!reen 5 re

  • 8/19/2019 python for biologists

    101/227

    @3 Chapter B: +ists an% oop

    ultiple exons from genomic '.4

    $his is very simiar to the eer!ises from the previo&s to !hapters, an% so o&r

    so&tion to it is going to oo- very simiar. +ets !on!entrate on the ne bit of theprobem first 5 rea%ing the fie of eon o!ations. 4s before, e !an start by opening

    &p the fie an% printing ea!h ine to the s!reen:

    exon/locations B open(*exons.txt*#for line in exon/locations!

    print(line#

    $his gives &s a oop in hi!h e are %eaing ith a %ifferent eon ea!h time ro&n%.'f e oo- at the o&tp&t, e !an see that e sti have a neine at the en% of ea!h

    ine, b&t e not orry abo&t that for no:

    ,G

    72,133

    1J0,27"

    3'0,3JG

    ;o e have to spit &p ea!h ine into a start an% stop position. $he split metho%

    is probaby a goo% !hoi!e for this 9ob 5 ets see hat happens hen e spit ea!h

    ine &sing a !omma as the %eimiter:

    exon/locations B open(*exons.txt*#for line in exon/locations!

    positions B line.split(,#print(positions#

    $he o&tp&t shos that ea!h ine, hen spit, t&rns into a ist of to eements:

  • 8/19/2019 python for biologists

    102/227

  • 8/19/2019 python for biologists

    103/227

    @ Chapter B: +ists an% oop

    genomic/dna B open(*genomic/dna.txt*#.read(#exon/locations B open(*exons.txt*#

    for line in exon/locations!

    positions B line.split(,#start B positions$0stop B positions$1

    exon B genomic/dna$start!stopprint(*exon is! * C exon#

    *nfort&natey, hen e r&n this !o%e e get an error at ine H:

      ile *multiple/exons/from/genomic/dna.py*, line 7, in ;module+

    exon B genomic/dna$start!stop)ype9rror! slice indices must -e integers or

  • 8/19/2019 python for biologists

    104/227

    @G Chapter B: +ists an% oop

    b&t instea% e have a singe exon variabe that stores one eon at a time. 7eres

    one ay to get the !ompete !o%ing se

  • 8/19/2019 python for biologists

    105/227

    @H Chapter B: +ists an% oop

    coding seuence is ! &%)A&&%)&%A&%A)%&)A&%A)&%)&%A)&%)A%)&%A)&A)&%A)&%A)&%

    coding seuence is !

    &%)A&&%)&%A&%A)%&)A&%A)&%)&%A)&%)A%)&%A)&A)&%A)&%A)&%&%A)&%A)&%A)A)&%A)&%A)A)&A)&%A)%&A)&%A)&A)&%A)&%A)&%A)&%A)&%Acoding seuence is !

    &%)A&&%)&%A&%A)%&)A&%A)&%)&%A)&%)A%)&%A)&A)&%A)&%A)&%&%A)&%A)&%A)A)&%A)&%A)A)&A)&%A)%&A)&%A)&A)&%A)&%A)&%A)&%A)&%A&%A)&%A)&%A)&%)A%&)A%&)A%&)A%A)&%A)&A)&A)&%)A%&)A%&)&%A&)A%&)A&%)A&%A)&%A)%&A)&%A)&%)Acoding seuence is !&%)A&&%)&%A&%A)%&)A&%A)&%)&%A)&%)A%)&%A)&A)&%A)&%A)&%&%A)&%A)&%A)A)&%A)&%A

    )A)&A)&%A)%&A)&%A)&A)&%A)&%A)&%A)&%A)&%A&%A)&%A)&%A)&%)A%&)A%&)A%&)A%A)&%A)&A)&A)&%)A%&)A%&)&%A&)A%&)A&%)A&%A)&%A)%&A)&%A)&%)A&%A)&%A)&%A)&%A)&%A)&%

    A)&%A)&%A)&%A)&%A)&%)A%&a