Top Banner
NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry Lectures: MWF 10:30am-11:20am, Rm 733 Hamilton Hall Class Projects & Exams: Thur. 6:00-8:00pm, Rm 733 Hamilton Hall COURSE OUTLINE Instructor: Dr. Robert Powers Office Labs Address: 722 HaH 721 HaH Phone: 472-3039 Phone: 472-5316 e-mail:[email protected] web page: http://bionmr.unl.edu/ Office Hours: 11:30-12:30 am MWF or by Special Appointment. Required Text: J. N. S. Evans, Biomolecular NMR Spectroscopy, Oxford University Press Recommended Text: M. H. Levitt, Spin Dynamics – Basics of Nuclear Magnetic Resonance, Wiley
51

NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

Feb 02, 2016

Download

Documents

malory

NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry Lectures: MWF 10:30am-11:20am, Rm 733 Hamilton Hall Class Projects & Exams: Thur. 6:00-8:00pm, Rm 733 Hamilton Hall COURSE OUTLINE Instructor: Dr. Robert Powers Office Labs - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

NMR Spectroscopy and Protein Structures

Chem 991A Special Topics in Physical Chemistry

Lectures: MWF 10:30am-11:20am, Rm 733 Hamilton Hall

Class Projects & Exams: Thur. 6:00-8:00pm, Rm 733 Hamilton Hall

COURSE OUTLINE

Instructor: Dr. Robert Powers Office Labs

Address: 722 HaH 721 HaHPhone: 472-3039 Phone: 472-5316e-mail:[email protected] web page: http://bionmr.unl.edu/

Office Hours: 11:30-12:30 am MWF or by Special Appointment.

Required Text: J. N. S. Evans, Biomolecular NMR Spectroscopy, Oxford University Press Recommended Text: M. H. Levitt, Spin Dynamics – Basics of Nuclear Magnetic Resonance, Wiley

Page 2: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

“NMR of Proteins and Nucleic Acids” Kurt Wuthrich

“Protein NMR Spectroscopy: Principals and Practice” John Cavanagh, Arthur Palmer, Nicholas J. Skelton, Wayne Fairbrother

“Principles of Protein Structure” G. E. Schulz & R. H. Schirmer

“Introduction to Protein Structure” C. Branden & J. Tooze

“Enzymes: A Practical Introduction to Structure, Mechanism, and Data Analysis” R. Copeland

“Biophysical Chemistry” Parts I to III, C. Cantor & P. Schimmel

“Principles of Nuclei Acid Structure” W. Saenger

Some Other Recommended Resources

Course Outlined (cont.)

Page 3: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

Some Important Web Sites:RCSB Protein Data Bank (PDB) Database of NMR & X-ray Structureshttp://www.rcsb.org/pdb/

BMRB (BioMagResBank) Database of NMR resonance assignmentshttp://www.bmrb.wisc.edu/

CATH Protein Structure Classification Classification of All Proteins in PDBhttp://www.cathdb.info/

SCOP: Structural Classification of Proteins Classification of All Structures into http://scop.berkeley.edu Families, Super Families etc.

PDBeFold Compares 3D-Stuctures of Proteins to http://www.ebi.ac.uk/msd-srv/ssm/ Determine Structural Similarities of New

Structures

NMR Information Server NMR Groups, News, Links, Conferences, Jobshttp://www.spincore.com/nmrinfo/

NMR Knowledge Base A lot of useful NMR linkshttp://www.spectroscopynow.com/

Course Outlined (cont.)

Page 4: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

Course Outlined (cont.) Course Work:

Oral Reports (2): 100 pts (variable due dates)Ubiquitin Assignment 100 pts (due Dec. 13)Problem Set: 100 pts (due Dec. 13)Exam 1: 100 pts (Thur., Oct. 3)Exam 2: 100 pts (Thur., Nov. 7)Final Exam: 200 pts (Fri, Dec. 20, 10am-12pm)

Total: 700 pts.

Answer keys for the problem sets and exams will be posted on BlackBoard.

Grading scale: A+=95%; A=90%; A-=85%; B+=80%; B=75%; B-=70%; C+=65%; C=60%; C-=55%; D=50%; D-=45%; F=40%

Page 5: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

Class Participation

•Reading assignments should be completed prior to each lecture. The required text will only supplement the lecture material. A vast majority of the material for the class will come from the lectures.

•You are expected to participate in ALL classroom discussions

Exams

•All exams (except the final) will take place at 6 pm in Hamilton Hall Rm. 733 on the scheduled date.

•The length of each exam (except the final) will be open-ended. You will have as much time as needed to complete the exam.

•Bring TI-89 style calculator or a simpler model, and an approved translator if required.

•A review session will take place during the normal class time prior to each exam.

•ALWAYS SHOW ALL WORK!!!!

Course Outlined (cont.)

Page 6: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

Lecture Topics (Tentative Schedule)

Date Topic ChapterI. Overview of Protein Structures

Aug 26 Introduction Aug 28 Linux and AwkAug 30 Protein Structures from an NMR Perspective 4Sept 4Sept 6Sept 9Sept 11Sept 13Sept 16Sept 18Sept 20Sept 23Sept 25 Protein Modeling Software 3.9Sept 27Sept 30Oct 2Oct 3 EXAM 1Oct 4 Molecular Mechanics and Dynamics 3.5-3.9Oct 7Oct 9 Comparison of X-ray and NMR StructuresOct 11Oct 14 Isotope Labeling of Proteins 4.2.2 – 4.2.3Oct 16

II. NMR Assignment Problem 2Oct 18 NMR Software 3.9Oct 21 to Oct 22 Fall Break

Page 7: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

Lecture Topics (continue)

Date Topic ChapterOct 23Oct 25 2D NMR 2.1Oct 28Oct 30 3D NMR 2.2Nov 1 4D NMR 2.3

III. NMR Structure Determination 3Nov 4 NOEs 3.1Nov 6Nov 7 EXAM 2Nov 8Nov 11 Chemical shifts, Coupling constants, Amide Exchanges 4.1.4, 3.2, 4.1.3, 5.2Nov 13Nov 15 Stereospecific assignments, RDCs 4.1.2Nov 18 Quality of NMR Structures 3.10Nov 20

IV. Protein Dynamics 1.3,1.4,Nov 22 T1,T2, NOE & S2

Nov 25Nov 27 to Nov 29 Thanksgiving

V. Protein-Ligand Structures 6.3Dec 2 SAR by NMR, Other 1D and 2D MethodsDec 4 Transfer NOE 6.5Dec 6 Filtered & edited NMR experimentsDec 9 Metabolomics 6.7Dec 11Dec 13 Problem Set & Ubiquitin Assignment dueDec 20 FINAL EXAM

Page 8: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

ORAL PRESENTATION OF STRUCTURE PAPERS

– Two 20 minute Oral Presentations • Thursday Evenings at 6pm in HaH 733

• Audience Participation is Expected (like a journal club)

• Presentation Dates Randomly Assigned (see syllabus page 4)

• 50 points per presentation – total of 100 points

– Paper of Your Choice• A Protein Structure Should be a Major Focus of the Paper

• The Paper Topic Should be of General Interest and of Significant Impact

• Send an Electronic Copy of the Paper to the Class Prior to Your Presentation

– Some Recommended Sources• Nature Structural Biology, Science, Nature, Cell, Molecular Cell,

Structure, Protein Science, PNAS, Journal of Molecular Biology, Biochemistry, and Journal of Biomolecular NMR.

• The paper may cover a protein structure or a protein-complex (small molecule, protein, DNA, RNA, etc).

Page 9: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

ORAL PRESENTATION OF STRUCTURE PAPERS

– Presentation Goal• Present a Clear Understanding of the Goals and Findings of the

Paper to the Class

• Why was the particular protein the target of the paper?

• How was the structure determined? Were there any challenging issues?

• What structure was determined for the protein (fold?)

• What are some interesting features of the structure (dynamics)?

• Are there any unique structural differences compared to other members of the family?

• What structural features are important to function?

• How was the structure used to support or refute the biological focus of the paper?

• Does the structure actually support the conclusion or did the author’s over interpret the data?

• Does the data/structure suggest other equally plausible conclusions?

Page 10: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

ORAL PRESENTATION OF STRUCTURE PAPERS

– Grading• Combination of My Assessment and the Other Students’ Assessment

• Each Student will be Limited to Giving Approximately 30% As, 55% Bs, And 15% Cs

• Default Grade is a B, an A or C will Require Justification

• All the assessments will be averaged together to determine the number of points

– Assessing the Presenter • How well did the presenter understand the material?

• How clearly did the presenter discuss the material?

• Was the chosen paper of general interest and biologically significant?

• Was the structure relevant and important to the paper?

• How well did the presenter answer questions?

• Did the paper lead to an interesting discussion?

Average Assessed Grade: A: 50pts, B+: 45pts, B: 40pts, B-: 35pts, C+: 30pts, C: 25pts

Page 11: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

ORAL PRESENTATION OF STRUCTURE PAPERS

Oral Presentation Schedule9/5 9/12 9/19 9/26 10/10

Jonathan Catazaro Mark Carter Bradley Worley Teklab Gebregiworgis Jonathan CatazaroJeffrey Jeppson Jessica Periago Shulei Lei Darrell Marshall Jeffrey Jeppson

         10/17 10/24 11/14 11/21 12/5

Mark Carter   Bradley Worley Teklab Gebregiworgis  Jessica Periago   Shulei Lei Darrell Marshall  

         12/12        

                           

Tentative Schedule

Page 12: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

Course Assignments

– Two Separate Graded Assignments • A standard problem set – included at the end of the syllabus

• An NMR assignment problem

• Due data for both assignments is the beginning of class on Fri. Dec. 13

• Late Problem Sets will NOT be accepted

– Grading - General• Each Assignment is worth 100 pts. (200 pts. total)

• Show ALL work to receive full credit

• You must submit your own set of answers

– Some Additional Considerations• Please start both assignments NOW!

• Please work together

• Please visit my office hours for assistance

Page 13: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

Course Assignments

– The Standard Problem Set Has Two Sections• Writing simple AWK programs to manipulate files

• Using Xplor and other software to analyze protein structures

• Due date for both assignments is the beginning of class on Fri. Dec. 13

• Late Problem Sets will NOT be accepted

– Grading – Standard Problem Set• No unique answer for programing section, either it works or it doesn’t

• E-mail me your scripts and I will run them

• If it works full credit, if not zero points

• The analysis of the protein structures section will have defined answers

• Please submit the answers to the protein structure section on the due date

Page 14: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

Course Assignments

– NMR Assignment Problem Set• Determine the backbone NMR Assignments for Ubiquitin

• The completed project should include a cover page that summarizes your assignments using the following template:

 Res HN 15N Ca Cb Ca(i-1) Cb(i-1)

CO(i-1)M1Q2I3...G76

• Include peak-pick list from the six spectra used to assign the protein

1 10 20 30 40

Sequence: MQIFVKTLTG KTITLEVEPS DTIENVKAKI QDKEGIPPDQ

50 60 70 76

QRLIFAGKQL EDGRTLSDYN IQKESTLHLV LRLRGG

Page 15: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

Course Assignments– NMR Assignment Problem Set

• You will ALL have access to a standard dataset of NMR spectra:• 2D 1H-15HSQC, 2D 1H-13C HSQC, 3D HNCO, 3D HNCA, 3D CBCANH, and 3D

CBCACONH

• Data will be available on the computers in the Research Instrument NMR Facility (HaH 832)

• All the necessary software for the processing and analyzing of the data will also be available on these computers

– Goal• Assign the minimal set of backbone resonances (HN, 15N, 13CO, C, C)

• Provide practical experience with using NMR data to assign a protein

• Complete as much of the backbone assignments as possible

– Grading – NMR Assignment Problem Set• Based on how complete the assignments are

• Scaled based on overall success of the class

Page 16: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

Introduction to Linux/Unix

Linux: A UNIX–like operating system developed as a free and open source software

User interface is a traditional and cumbersome command line in a shell (window)

There are a number of flavors (distributions) of Linux with different graphical user interfaces (GUI) or desktop interfaces (attempt to be Mac or Windows-like)

- Debian, Fedora, Ubuntu, Mageia, Mint Linux, etc. Similarly, there are a number of PC-look-a-like software programs (free & commercial) (WORD, EXCEL, etc). Initially thought it would replace the Windows PC

Very popular in academia because it is free and open for development

“Linux is for Adults” – Stephan Grzesiek

Page 17: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

Introduction to Linux/Unix

Typical Linux Shell Environment Typical Linux “Windows” Environment

Simple “command line” execution of programs or editing of files

Mimics PC/Mac desktop GUI environment

Page 18: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

Introduction to Linux/UnixConnecting from a PC by a Terminal

Emulation Software (PuTTY) Connecting from a PC by Samba

command line environment PC/Mac folder environment

Page 19: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

Introduction to Linux/Unix

– Graphical User Interface (GUI) or PC/MAC Desktop Environment• You can use the Desktop like a PC, but can be cumbersome

Minimal (if any ) standards, everything in the environment needs to be configured

Downside of open-source (free) software – many contributors with little to no managers

– More common to work in a shell using the command line• Primitive (“Old School”)

Minimal mouse functions, pull down menus or other common features we are accustom to

• Need to memorize commands and options (“flags”)

• Need to open a Terminal, Window or Shell Right click mouse and select “open terminal”

Page 20: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

Introduction to Linux/Unix

– Three Common Linux Commands: pwd, ls and cd• pwd – identifies the current path or directory

• ls – list the files and folders in the current directory

• cd path - move to the defined path (change directory)

‒ cd .. (move up one directory),

‒ cd ../.. ( move up two directories)

Page 21: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

Introduction to Linux/Unix‒ For a Complete List of Linux Commands and Explanations see

• http://linuxcommand.org/• Or the book “Linux in a Nutshell”

‒ Some Other Common Commands • echo “text” – display or print text• exit – close a terminal• clear – clear all text in a terminal• mkdir - make a new directory• rm - remove/delete file• mv - moves files • cp - copies files • ps – lists all active user programs and display a PID (process identification

number)• kill pid - will kill (stop) the process with the listed pid number • man command - will display the manual for the listed command• cat file – display the contents of a file (also used to combine or

concatenate multiple files)• vi file – will open file with a primitive text editor• chmod file [flags] – will change or set permissions for file defined by flags

Page 22: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

Introduction to Linux/Unix‒ It Gets More Complicated!

‒ A number of commands have a range of options that are implemented on the command line with a “flag”• ls –l - lists files and folders with associated permissions

• rm –R - remove/delete folder

• mv –i – prompt before overwriting an existing file with the same name

• cp –n – do not overwrite an existing file with the same name

• cp –u – only overwrite an older file with the same name

• ps –axu – lists the detailed status of every process on the system with the name of the user

• chmod 755 file – change file’s permission such that file's owner may read, write, and execute the file. All others may only read and execute the file.

‒ Multiple flags can be used simultaneously• Again, man pages, Linux web site and reference books provide more details

Page 23: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

Introduction to Linux/Unix

‒ One More Very Useful Command

‒ sort• Quickly re-order or sort the rows of a tabular file with n number of columns

sort –rn $n filename > newfilename

- $n – the number of the column that will be sorted

- r – sort in reverse order

- n – sort based on numeric value of the string

Page 24: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

Permissions‒ You can’t read, write, edit or execute a file without permission!

Directory

Number of files in Directory

File Owner

Group Owner belongs to

Size of file in kilobytes Filename

File Date or Time Stamp

Page 25: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

Permissions‒ Reading and understanding permissions

Permissions

Page 26: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

Permissions‒ Where did the 755 come from in the chmod command?

Think of the permission settings as a series of bits :

rwx rwx rwx = 111 111 111rw- rw- rw- = 110 110 110rwx --- --- = 111 000 000

and so on...

rwx = 111 in binary = 7rw- = 110 in binary = 6r-x = 101 in binary = 5r-- = 100 in binary = 4-xx = 011 in binary = 3-x- = 010 in binary = 2--x = 001 in binary = 1--- = 000 in binary = 0

Page 27: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

| (pipe) - passes output of one Linux command to the input of a second command

• Example: ls |wc (wc – counts the number of characters, words and lines)

• Not limited to just one pipe, can string multiple pipes together

>, < - redirection of files • command > filename – output of command (or program) is sent to a file

called filename instead of being displayed on the screen Example: ls > file_list

• command < filename – the file filename is the input to the command or program

Example: xplor < psf.inp

Pipes and Redirection

Page 28: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

Background Calculations‒ For long calculations don’t want the process directly associated with

the window or shell• Window must remain open and active during calculation • Window is “locked” until the program is finished• Calculations will be stopped if the window is closed• A intense calculation can overwhelm the shell environment, leading to the

window crashing or even slow down your computer• Output displays on window can be lost, lock window or crash computer

‒ Instead, submit your “job” to the “background”• Lowers the calculations priority to access the CPU• Any interactive calculation has the highest priority• Example: background - xplor < psf.inp > psf.out &&

interactive - xplor < psf.inp

‒ Use ps command to monitor status of background jobs

Page 29: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

vi – Primitive Text Editor‒ Opens any text based file for reading, editing and writing

• Only simple text or ASCII files can be edited with vi

• You will see gibberish with *.doc, *.pdf, etc.

‒ Like Linux, vi uses a number of simple command line functions• A number of the functions require a key combination (ctrl key + another

key)

• For a Complete List of Vi Commands and Explanations see “The Vi Lovers Home Page” http://thomer.com/vi/vi.html Or “Learning the vi Editor” by L. Lamb, O’Reilly & Associates, Inc

‒ vi filename • If filename exists, vi will open the file for editing• If filename doesn’t exist, vi will create the file for editing

Page 30: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

vi – Primitive Text Editor

Editing Mode Line number Column number

Cursor Location

Cursor

What part of the text is shown:All

TopBot

Percentage

Page 31: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

‒ Working with files • :q – quits only if no changes to the file have been made

• :q! – force vi to quit without saving any changes

• :wq filename – quits and writes the contents of the file to a new file named filename

• :wq! – quits and writes the file to the current filename

• :r filename – inserts the contents of the file filename into the current file at the cursor location

‒ Moving around the file• :number – jumps to the specified line number in the text

• G or :$ - jump to last line

• Ctrl-g – gives current line number

• Ctrl-f or Ctrl-d – move forward

• Ctrl-b or Ctrl-u– move back

• Arrow Keys – allows you to move around the file and position the cursor

vi – Primitive Text Editor

Page 32: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

‒ Adding to a file• Enter Key – adds a blank line at the cursor position• Esc key – exits or leaves the active vi function• i or a – enters insert mode, allows text to be typed into the file at the

location of the cursor• R – enters replace mode, allows text to be typed into the file at the location

of the cursor replacing any existing text

‒ Deleting• dd – deletes the line at the position of the cursor• dw – deletes the word at the position of the cursor• x – deletes the character under the cursor• r – replace the character under the cursor• D – deletes from the cursor position to the end of the line• u – undo the last edit or change• U – undo all the edits on a single line

Place a number in front of command and the command will be executed that many times

vi – Primitive Text Editor

Page 33: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

vi – Primitive Text Editor

‒ Copying and Pasting Text• number yy– yanks (copy) the specified number of lines (starting at the

cursor)

• p– put (pastes) the previously yanked (copied) lines in the text after the cursor

• J – joins two lines at the position of the cursor

‒ Global Search and Replace• /text – moves the cursor to the next location of text in the file

• n – moves to the next occurrence of text in the file

• :%s/search_string/replacement_string/g – globally replace search_string with replacement_string

Page 34: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

Awk/Nawk – Primitive (but Powerful) Programing language

‒ Interpreted (not compiled) language• C-like

• A file containing the software code needs to be passed to Awk

- awk_script.awk – the Awk program- infilename – the file used by the Awk program- outfilename – the output generated by the Awk program

‒ Awk significantly simplifies writing a quick program • Automatically handles opening and reading files and inputing data into

standard variables

• Structured to read a file composed of rows and columns

• IMPORTANT – sequentially reads each row as it executes the program If 10 rows, the program gets executed 10 times – major source of confusion

awk –f awk_script.awk infilename > outfilename

Page 35: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

BEGIN

END

logic statements (if, and, or, not)arithmeticlooping table arraysprinting

INPUT OUTPUT

Awk Program Structure

Page 36: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

• All of the commands in the section defined by BEGIN occurs BEFORE the file is read

• All of the commands in the section defined by END occurs AFTER the file is read

• To comment out a line of text from a script add “#” before text

- Line is skipped by Awk

# This script politely introduces itselfBEGIN {print “Hello, world”}

{#Main – Does Nothing, but still reads file}

END {print “Bye, world”}

BEGIN/END

Page 37: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

• The BEGIN section is commonly used to set or define the value of variables used by the MAIN program

• Also, to open an input data or information from other files

• The END section is commonly used to print out the results of the Awk Program

BEGIN {CAmax[0]= "65.52"CAmin[0]= "43.00"CBmax[0]= "38.70"CBmin[0]= "0.00" Res[0]="A“i=1

While {getline < ref.pck > 0) {CAref[i] = $1CBref[i] = $2i++

}}{#Main – Does Nothing, but still reads file}

BEGIN/END

END {For (i = 1; i <= NR; i++) {print CA[i],CB[i]}

}

Page 38: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

• The various functions of AWK performs the tasks you want as the program sequentially reads the input file

MAIN

PkID NH N15 CA CB CAi CBi COi 1.00 9.35 126.75 53.19 40.06 63.53 69.87 172.90 2.00 9.10 126.69 59.42 31.90 52.92 43.03 174.943.00 9.73 126.68 60.73 38.11 54.64 31.38 171.924.00 7.80 126.57 57.28 33.99 56.10 30.75 172.605.00 8.84 126.52 58.35 28.58 53.25 40.03 173.126.00 8.14 125.85 65.85 31.89 53.03 41.07 173.127.00 9.01 125.35 62.57 42.15 52.70 41.84 171.998.00 8.15 125.24 54.86 40.69 55.79 30.35 171.17

Consider the following input file:

$1 $2 $3 $4 $5 $6 $7 $8

Awk sequentially reads each row redefining the value of each standard variable ($1 to $8)- NF is set to the number of fields (columns), 8 in this example- NR is set to the number of rows, 9 in this example- $0 is a string corresponding to the entire row

Page 39: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

MAIN

• The primary Awk functions can be grouped into 5 categories – Logic statements– Arithmetic – Looping– Arrays– Printing

Page 40: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

MAIN BEGIN {CAmax[0]= "65.52"CAmin[0]= "43.00"CBmax[0]= "38.70"CBmin[0]= "0.00" Res[0]="A"

{ PkID=$1NH=$2 N15=$3CAiatom=$4CBiatom=$5CAatom=$6CBatom=$7COi=$8

CAup=sqrt(CAmax[0] – Caiatom)CBdn = CBiatom/CBmin[0]CO2 = Coi^2

• As the file is being read in, you can now write instructions to test, change or manipulate the original data

• You can define your own variable names

• You can do any number of arithmetic functions (

– Basic math +,-,*,/,^– General functions – cos(x), exp(x),

sqrt(x), etc.

Page 41: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

• Logic statements– if (logical test of a parameter/variable)

– Probably most important logic command– General call structure is

• if (statement to test) {action} • Example: if ($1 == “HAPPY”)

– Reads “if column 1 equals HAPPY”– If this is true then we do something

– else‒ Used to perform an action when the if statement is false‒ else {action}‒ Example

BEGIN {$1 = “HAPPY” if ($1 == “HAPPY”) print “I am HAPPY”else print “I am SAD”

}

Functions

Page 42: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

• Logic statements– ! (not) true if not a match

• Example: if ($1 !=“HAPPY”) • True if $1 NOT EQUAL to “HAPPY”

– && (and) true only if both conditions are met• Example: if ($1 > $2 && $1 > $3)

• True if $1 is larger than BOTH $2 and $3

– || (or) true if one of multiple conditions are met• Example: if ($1 > $2 || $1 > $3)

• $1 only needs to be larger than either $2 or $3 for the statement to be true

Functions

Page 43: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

Functions

• Loops – allows you to repeat a set of instructions until a condition is met

• Major source of problem – infinite loop– The exit condition is never met

• Two loop functions– For

– WhileEND {

For (i = 1; i <= NR; i++) {print CA[i],CB[i]}

}

BEGIN {While {getline < ref.pck > 0) {

CAref[i] = $1CBref[i] = $2i++

}{

{ For (i = 1; i <= NF; i++) {

if ($i >= 54.0 && <= 55.0) count++ }{

Page 44: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

Functions

• Arrays – allows you to assign multiple values to a single variable

• Effectively allows you to sort or group information

• Two types of Arrays– 1D: CA[0]

– 2D: CB[0,0]

BEGIN {i=1

{ {

PkID[i]=$1NH[i]=$2 N15[i]=$3Caiatom[i]=$4Cbiatom[i]=$5Caatom[i]=$6Cbatom[i]=$7Coi[i]=$8

i++}

Page 45: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

Functions

• printf – primary mechanism of reporting the results of the Awk program to the user

• Extremely flexible number of options available to format output– Can do calculations within print

statement

– Can be frustrating to get it right.

• Two types of print statements– print: no formatting, just prints the

value of the valuable

– printf: full range of formats available

BEGIN {state=“HAPPY”

}

{ For (i = 1; i <= 10; i++) {

print iprint i*iprintf (“%s\n”, state)

}{

Page 46: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

Functions

‒ Examples of different formatting options with printf• Each variable needs a type definition:

%d - decimal %s - string %f – floating point %e – floating point with scientific notation

• Formatting is “literal”

printf (“%s%s\n”, $1,$2)– print all the characters in column 1 (%s) and column 2 (%s)

– \n print new line

– no spacing » $1 = HAPPY and $2 = SAD the output would be HAPPYSAD

Page 47: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

Functions

‒ Examples of different formatting options with printf• Spacing , Tabs and justifications

The number of spaces between type definitions will be printed \t – Tab, using system defined tab locations \n – print new line Can use any number or combination of tabs, spaces and new lines Default printing is right justified For left justification, place a – in front of the type classification (e.g. %-10s)

printf (“%s %s\n”, $1,$2)– single space

» $1 = HAPPY and $2 = SAD the output would be HAPPY SAD

printf (“%s \t%s\n\n”, $1,$2)– five space then tab

» $1 = HAPPY and $2 = SAD the output would be HAPPY SAD» Followed by two new lines

Page 48: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

Functions

‒ Examples of different formatting options with printf• Precision Modifier

“Fine tunes” how the variable is printed Defines both spacing and number of characters or significant figures printed Simply, place a number in front of the type classification (e.g. %5.3f)

printf (“%10s%5s\n”, $1,$2)– 10 spaces for the first string and 5 spaces for second string– Spaces include the number of characters in the string

» $1 = HAPPY and $2 = SAD the output would be HAPPY SAD» 5 spaces in front of HAPPY (5 spaces + 5 characters in HAPPY = 10)» 2 spaces in front of SAD ( 2 spaces + 3 characters in SAD = 5)» OR printing of $1 will end on column 10 and printing of $2 will end on column 15

printf (“%f %5.3f\n”, $1,$1)» $1 = 1/3, the output would be 0.333333 0.333 » %f – all the characters are printed» 5 in %5.3 indicates a total of 5 characters are printed (including decimal point)» 3 in %5.3 indicates a total of 3 characters are printed to the left of decimal point

Page 49: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

Functions

‒ Examples of different formatting options with printf• Printing is “literal”

Anything within the quotes is printed

printf (“%s HELLO %s\n”, $1,$2)» $1 = HAPPY and $2 = SAD the output would be HAPPY HELLO SAD

printf (“Hello World\n”)» Don’t need to print a variable» The output would simply be: Hello World

• Print to a File Simply redirect the output of the print or printf statement to a file name

printf (“Hello World\n”) > helloworld.txt

Page 50: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

Functions

‒ Examples of different formatting options with printf• Can do Math within the print and printf statement

printf (“%d %d\n”, $1^2,sqrt($2))» $1 = 1/3, the output would be 0.111111 0.577

• This is a general feature of Awk, functions can be imbedded within other functions

• For More information on Awk, see• The book “sed and awk” by Dale Dougherty O’Reilly and Associates• The GNU Awk Users Guide: http://www.gnu.org/software/gawk/manual/gawk.html• Effective Awk Programming: http://www.gnu.org/software/gawk/manual/

Page 51: NMR Spectroscopy and Protein Structures Chem 991A Special Topics in Physical Chemistry

Linux & AWK – Final Thoughts

• These Lectures have only meant to serve as a general introduction to both Linux and Awk

• There is a lot more detail and other topics that simply were not covered. Entire courses are dedicated to these topics. I did not present everything there is to know about Linux and Awk or programming in general

• Mastering an operating system and computer programming will only come from extensive effort and practice

• The best way to learn is by doing!!