Top Banner
9/23/2015 BCHB524 - 2015 - Edwards Advanced Python Data Structures BCHB524 2015 Lecture 7
18

9/23/2015BCHB524 - 2015 - Edwards Advanced Python Data Structures BCHB524 2015 Lecture 7.

Jan 05, 2016

Download

Documents

Kelley Ferguson
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 9/23/2015BCHB524 - 2015 - Edwards Advanced Python Data Structures BCHB524 2015 Lecture 7.

9/23/2015 BCHB524 - 2015 - Edwards

Advanced Python Data Structures

BCHB5242015

Lecture 7

Page 2: 9/23/2015BCHB524 - 2015 - Edwards Advanced Python Data Structures BCHB524 2015 Lecture 7.

9/23/2015 BCHB524 - 2015 - Edwards 2

Outline

Revision of list data-structures

Advanced Data-structures Dictionaries, Sets, Files

Reading, parsing files (codon tables)

Exercises

Page 3: 9/23/2015BCHB524 - 2015 - Edwards Advanced Python Data Structures BCHB524 2015 Lecture 7.

9/23/2015 BCHB524 - 2015 - Edwards 3

Data-structures: Lists

Compound data-structure: Many objects in order numbered from 0 [] indicates list.

Item access and iteration Same as for string, "l[i]" for item i "for item in l" for each item of the list.

List modification items can be changed, added, or deleted.

Range is a list String ↔ List

Page 4: 9/23/2015BCHB524 - 2015 - Edwards Advanced Python Data Structures BCHB524 2015 Lecture 7.

9/23/2015 BCHB524 - 2015 - Edwards 4

Python Data-structures: Dictionaries

Compound data-structure, stores any number of arbitrary key-value pairs. Keys and/or value can be different types Can be empty Values can be accessed by key Keys, values, or pairs can be accessed by iteration Values can be changed Key, value pairs can be added Key, value pairs can be deleted

Page 5: 9/23/2015BCHB524 - 2015 - Edwards Advanced Python Data Structures BCHB524 2015 Lecture 7.

9/23/2015 BCHB524 - 2015 - Edwards 5

Dictionaries: Syntax and item access

# Simple dictionaryd = {'a': 1, 'b': 2, 'acdef': 3}print d

# Access value using its keyprint d['a']

# Change value associated with a keyd['acdef'] = 5print d

# Add value by assigning to a dictionary keyd['newkey'] = 10print d

Page 6: 9/23/2015BCHB524 - 2015 - Edwards Advanced Python Data Structures BCHB524 2015 Lecture 7.

9/23/2015 BCHB524 - 2015 - Edwards 6

Dictionaries: Iteration# Initialized = {'a': 1, 'b': 2, 'acdef': 5, 'newkey': 10}

# keys from dprint d.keys()

# values from dprint d.values()

# key-value pairs from dprint d.items()

# Iterate through the keys of dfor k in d.keys():    print k,print

# Iterate through the key-value pairs of dfor k,v in d.items():    print k,"=",v,print

Page 7: 9/23/2015BCHB524 - 2015 - Edwards Advanced Python Data Structures BCHB524 2015 Lecture 7.

9/23/2015 BCHB524 - 2015 - Edwards 7

Dictionaries: Different from lists?

# Initialized = {}

# Add some values, integer keys!d[0] = 1d[1] = 2d[10] = 1000

# See how the dictionary looksprint d

# Test whether a key is in the dictionaryprint "Is key 15 in d?",d.has_key(15)

# Access value with key 15 with default -1print "Value for key 15, or -1:",d.get(15,-1)

# Access value with key 15 - error!print "Value for key 15:",d[15]

Page 8: 9/23/2015BCHB524 - 2015 - Edwards Advanced Python Data Structures BCHB524 2015 Lecture 7.

9/23/2015 BCHB524 - 2015 - Edwards 8

Python Data-structures: Sets

Compound data-structure, stores any number of arbitrary distinct data-items. Data-items can be different types Can be empty Items can be accessed by iteration only. Items can be tested for membership. Items can be added Items can be deleted

Page 9: 9/23/2015BCHB524 - 2015 - Edwards Advanced Python Data Structures BCHB524 2015 Lecture 7.

9/23/2015 BCHB524 - 2015 - Edwards 9

Sets: Add and Test Elements# Make an empty sets = set()print s

# Add an element, and then a list of elementss.add('a')s.update(['b','c','d'])print s

# Test for membershipprint "e is in s",('e' in s)print "e is not in s",('e' not in s)print "c is in s",('c' in s)

Page 10: 9/23/2015BCHB524 - 2015 - Edwards Advanced Python Data Structures BCHB524 2015 Lecture 7.

9/23/2015 BCHB524 - 2015 - Edwards 10

Python Data-structures: Files

Read strings from file, or Write strings to file. Get access to strings by iteration. Write by printing strings to file. Need to open and close files:

Need to indicate whether we want to read or write.

Page 11: 9/23/2015BCHB524 - 2015 - Edwards Advanced Python Data Structures BCHB524 2015 Lecture 7.

9/23/2015 BCHB524 - 2015 - Edwards 11

Files: Reading# Open a file, store "handle" in ff = open('anthrax_sasp.nuc')# MAGIC!print ''.join(f.read().split())# Close the file. f.close()

# Slowly, now...f = open('anthrax_sasp.nuc')# Store the entire file's contents in s (as string)s = f.read()print s# Split s at whitespacesl = s.split()print sl# Join split s with nothing in betweenjl = ''.join(sl)print jl# Close the filef.close()

Page 12: 9/23/2015BCHB524 - 2015 - Edwards Advanced Python Data Structures BCHB524 2015 Lecture 7.

9/23/2015 BCHB524 - 2015 - Edwards 12

Files: Reading# Open a filef = open('anthrax_sasp.nuc')# Iterate line-by-linefor line in f:    print line# Close the filef.close()

# Open a filef = open('anthrax_sasp.nuc')# Iterate line-by-line, and accumulate the sequenceseq = ""for line in f:    seq += line.strip()print "The sequence is",seq# Close the filef.close()

Page 13: 9/23/2015BCHB524 - 2015 - Edwards Advanced Python Data Structures BCHB524 2015 Lecture 7.

9/23/2015 BCHB524 - 2015 - Edwards 13

DNA Translation

First read a codon table from a file Codon table from NCBI's on-line taxonomy

resource Read line by line and use initial word to store 3rd

word appropriately.

AAs = FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG

Starts = ---M---------------M---------------M----------------------------

Base1 = TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG

Base2 = TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG

Base3 = TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG

Page 14: 9/23/2015BCHB524 - 2015 - Edwards Advanced Python Data Structures BCHB524 2015 Lecture 7.

9/23/2015 BCHB524 - 2015 - Edwards 14

DNA Translationf = open('standard.code')data = {}for l in f:    sl = l.split()    key = sl[0]    value = sl[2]    data[key] = value    f.close()

b1 = data['Base1']b2 = data['Base2']b3 = data['Base3']aa = data['AAs']st = data['Starts']

codons = {}init = {}n = len(aa)for i in range(n):    codon = b1[i] + b2[i] + b3[i]    codons[codon] = aa[i]    init[codon] = (st[i] == 'M')

Page 15: 9/23/2015BCHB524 - 2015 - Edwards Advanced Python Data Structures BCHB524 2015 Lecture 7.

9/23/2015 BCHB524 - 2015 - Edwards 15

DNA Translation

f = open('anthrax_sasp.nuc')seq = ''.join(f.read().split())f.close()seqlen = len(seq)aaseq = []for i in range(0,seqlen,3):    codon = seq[i:i+3]    aa = codons[codon]    aaseq.append(aa)print ''.join(aaseq)

Page 16: 9/23/2015BCHB524 - 2015 - Edwards Advanced Python Data Structures BCHB524 2015 Lecture 7.

9/23/2015 BCHB524 - 2015 - Edwards

Exercise 1

Using just the concepts introduced so far, find as many ways as possible to code DNA reverse complement (at least 3!) You may use any built-in function or string or list

method. You may use only basic data-types and lists and

dictionaries. Compare and critique each technique for

robustness, speed, and correctness.

16

Page 17: 9/23/2015BCHB524 - 2015 - Edwards Advanced Python Data Structures BCHB524 2015 Lecture 7.

9/23/2015 BCHB524 - 2015 - Edwards 17

Exercise 2

Write a program that takes a codon table file (such as standard.code from the lecture) and a file containing nucleotide sequence (anthrax_sasp.nuc) as command-line arguments, and outputs the amino-acid sequence. Modify your program to indicate whether or not the initial

codon is consistent with the codon table's start codons. Use NCBI's taxonomy resource to look up and download

the correct codon table for the anthrax bacterium. Re-run your program using the correct codon table. Is the initial codon of the anthrax SASP gene a valid translation start site?

Page 18: 9/23/2015BCHB524 - 2015 - Edwards Advanced Python Data Structures BCHB524 2015 Lecture 7.

Homework 4

Due Monday, September 28. Submit using Blackboard Use only the techniques introduced so far. Make sure you can run the programs

demonstrated in lecture(s). Exercises 1, 2 from Lecture 7

9/23/2015 BCHB524 - 2015 - Edwards 18