CS 111 Green: Program Design I Lecture 14: BLAST, methods, encodings & text, more text files Robert H. Sloan (CS) & Rachel Poretsky(Bio) University of Illinois, Chicago October 13, 2017
CS111Green:ProgramDesignILecture14:
BLAST,methods,encodings&text,moretextfiles
RobertH.Sloan(CS)&RachelPoretsky(Bio)UniversityofIllinois,Chicago
October13,2017
MORE FUNCTIONS, MAINLY BUILT-IN CLASS METHODS
COLLABORATION POLICY (AGAIN)
From the syllabus
n Consulting with your classmates on assignments is encouraged, except where noted. However, submissions are individual, and copying code from your classmates is considered plagiarism.
n To avoid suspicion of plagiarism, you must specify your sources together with all submitted materials. List classmates you discussed your assignment with and webpages from which you got inspiration or copied (short) code snippets. All students are expected to understand and be able to explain their submitted materials. For example, give the question "how did you do X?", a great response would be "I used function Y, with W as the second argument. I tried Z first, but it doesn't work''. An inappropriate response would be "here is my code, look for yourself."
Meaning # (portions from Brown CS 17, Fall 2016)
n You are encouraged to discuss lab/homework assignments with other students in the class. You may even work out solutions together. However, you are not allowed to take away any written notes, diagrams, or code from joint work sessions. Emails, IM conversations, and the like all constitute “notes”. q Must include comments stating whom you worked with in your turned-in code.
n We expect you to fully comprehend everything you hand in. To that end, you must write up your solutions entirely on your own, and you must debug your code entirely on your own. Learning to independently implement and debug solutions, possibly developed with your classmates, is a key CS 111 goal.
n Important: after participating in a joint work session, you must pause before writing up your solutions; a pause long enough to grab a cup of coffee with a friend is sufficient
Enforcement
n Violations will be bothq Reported to Dean of Students as academic misconduct
(cheating and plagiarism)q Receive grade penalties for first violation, failure in course to
suspension/expulsion from UIC for second violation
How do I know which functions exist? Python documentation
Additional built-in functions from modules
n Useful for certain kinds of things, e.g., math, internet, making graphs, random numbers are available in modules that must be imported before they can be used
n Will discuss a few soon, as needed
BUILT-IN functions for strings
n Strings and lists examples of "built-in class" and each comes with some built-in functions (and these class functions also called methods).
n Same as other built-in functions except calling syntax is .fn_nameq st = "gtgcgagggtcg"q st.upper() à "GTGCGAGGGGTCG"q st.find("a") à 5
OBJECTS AND DOT NOTATION
Objects
n (Implicit in Chapters 2.1 Objects and variables, 3.2 List Basics, 7.3 String Methods, 8.2 List Methods, but not explicit anywhere we'll assign: So pay attention!)
n Everything in Python is an objectn Object combines
q data (e.g., number, string, list) with q methods that can act on that object
Methods
n Methods: like (special case of) function but not globally accessible
n Cannot call method just by giving its name, the way we call print(), open(), abs(), type(), range(), etc.
n Method: function that can be accessed only through an objectq Using dot notation
Dot notation
n To call method, use dot notation:q object_name.method()
n String example:
>>> test= "This is my test string">>> test.upper()'THIS IS MY TEST STRING'
If o is object of type having method do_it where do_it needs an input in addition to o, and x is defined, what is the proper way to call do_it?
A. do_it(x)B. do_it(o, x)C. o.do_it(x)D. o.do_it(o, x)
Recall that str converts to type string
>>> x = 42>>> x == "42">>> False>>> str(x) == "42">>> True
n Is str() a method of strings?A. No, it's not a methodB. No, it's a method of
something elseC. YesD. I have no clue
Methods depend on type of object
n answers = [52, 17, 43]n answer.append(42)n answers is now [52, 17, 43, 42]n "test string".append("s") gives back an error because append
is not a method of stringsn ["cat", "dog"].count("cat") à 1n "catactcat".count("cat") à 2
Methods' importance
n Understanding key data types depends on understanding their methods
n We have already seen the append method for lists, and you probably want to use it in current lab, and in project
n Will come back to more list methodsn file reference methods write(), read(), readline(), readlines()
q But open is not a method
When you get to CS 341 & 342
n Or if you know Java or C++ nown methods are an Object Oriented (OO) conceptn In our CS 111
q We do need to know the basics of dot notation and methodsq We will otherwise be ignoring OO, and taking primarily a procedural
approach (built on functions, also called functional decomposition)
ENCODINGS AGAIN
Encodings again
n Recall that the smallest unit in a computer is the bitn One bit can take on 2 possible values: 0 or 1n Two bits can take on 4 possible values: 00, 01, 10, or 11n Three bits can take on 8 possible values: 000, 001, 010, 011,
100, 101, 110, 111
How many distinct values can 4 bits take on?
A. 4B. 8C. 9D. 13E. 16
Ben Bitdiddle says
n n bits can take on 2 times as many values as n-1 bits = 2n
values
How many distinct values can a byte take on
n (Recall that a byte = 8 bits)
A. 2B. 8C. 64D. 128E. 256
Encoding characters in bytes: 1960s
n ASCII: Use 1 byte to encode 95 printing characters
n The ones on every computer keyboard to this day
n Pretty much all encodings agree with ASCII on those 95 characters
n ASCII also has some nonprinting characters like newline and tab
Communicating common non-printing characters
n \n is used to denote the newline in a string literaln \t is used to denote the tab in a string literaln And so double backslash is used to denote a backslash in a
string literal.n What is len("\\")?
A. 0B. 1C. 2
But
n What about René Antoine Ferchault de Réaumur? n А что насчет Aрабского?
Encoding more characters
n Unicode: over 128,000 characters covering 135 modern and historical scripts, and symbols
n Python uses Unicode
FILES: REVIEW & A BIT MORE DETAIL
(Text) File reading, a little more slowly
n Recall text file = sequence of linesn Line = sequence of characters up to and including the special
newline character \nq (Special case: probably last set of characters at end of file will work
okay even if text file doesn't end with newline as it should.)q (How could we find out?)
Speaking of text
n afile.txt:
1234Can I have a little more?5678910I love you!ABCDCan I bring my friend to tea?
fileref = open("afile.txt", "r")line = fileref.readline()
What is len(line)?A. 0B. 1C. 4D. 5E. 6
Speaking of files and programming with them
n You need your execution environment, i.e., console, i.e., lower right panel of Spyder, to be working in directory you have the file you want to open
n Working directory button upper right corner
Can iterate over text file reference (not in book)
fileref = open("afile.txt", "r")
for line in fileref: # process each lineprocess line as we wishin this block
rest of program
fileref.close()
n Typically easiest way to read text file, all other things being equal
Strategies for read_seq(fname, n)
n Can't just use method from previous slide and process each line in same way
n Because we need to process the ">" comment lines differently from the following content lines
n Could use:q Previous slide method with a nested conditionalq for loop over the number nq readlines() and then deal with the list it returns