BITS Training Protein Structure Joost Van Durme VIB Switch Laboratory Vrije Universiteit Brussel http:// www.bits.vib.be/ training
BITS TrainingProtein Structure
Joost Van DurmeVIB Switch Laboratory
Vrije Universiteit Brussel
http://www.bits.vib.be/training
VIB Switch Laboratory
Topics for today
• Exploring the protein structure databank (PDB)
• Viewing and analyzing protein structures with YASARA
• Comparing similar protein structures
• In silico mutagenesis with FoldX
• Homology modeling with FoldX
VIB Switch Laboratory
•PDB contains 65000 structures•EMBL-Bank contains 114,475,051 sequences or 215,540,553,360
nucleotides!
Sequences and structures
VIB Switch Laboratory
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
1
10
100
1000
10000
100000
1000000
10000000
100000000
structuressequences
The sequence-structure gap
VIB Switch Laboratory
• X-ray crystallography (crystals)• Nuclear Magnetic Resonance (NMR) (in solution)• Electron microscopy (in native tissue)
Structures can be solved
VIB Switch Laboratory
• Solving structures is lots of work (6 months to years)
• Need lots of material/reagents. Solubility is a problem.
• Some protein structures are really difficult to solve: membrane proteins, extremely large proteins, protein complexes
• The field evolves fast. Techniques improve, more user friendly software, more automatisation (x-ray infrastructure, crystal growth)
• Despite this progress it is not expected that the sequence-structure gap will ever be closed.
But ...
VIB Switch Laboratory
What can we learn from models/structures?
• Active site structure: structure-based drug design
• Protein-protein interactions
• Function• Antigenic behavior / vaccine development
• Stabilising proteins using structural knowledge
VIB Switch Laboratory
Human vs parasite
Parasite
Active site
VIB Switch Laboratory
1918 Influenza Epidemic Influenza Virus
VIB Switch Laboratory
NEURAMIDASE POCKET
SIALIC ACID
VIB Switch Laboratory
RELENZA SIALIC ACID
VIB Switch Laboratory
RELENZA TAMIFLU5.000.000+ doses in NL
VIB Switch Laboratory
Trouw – 3 maart 2009
Trouw – 3 maart 2009
VIB Switch Laboratory
RELENZA TAMIFLU
WT Ki = 1.0H274Y Ki= 1.9
WT Ki = 1.0H274Y Ki
=265
H274YH274Y
VIB Switch Laboratory
PDB structures come from ...
• X-Ray crystallography experiments
• NMR structure determination
The PDB no longer contains:• EM structures (too low resolution)• Models (too unreliable)
VIB Switch Laboratory
Principle of X-Ray crystallography
initial model
electron densities
VIB Switch Laboratory
X-Ray structure
VIB Switch Laboratory
X-Ray models components
• x, y, z coordinates: define the mean atom position
• disorder about this mean: B-factor and occupancy• variations in time and space
• B-factor:• model the ‘smearing out’ of disorder around the mean atom
position (ellipsoids)• higher B-factor means more uncertainty about position
• Occupancy:• consider alternative conformations of the same sidechain• how often do we find this sidechain in one conformation and
how often in the other conformation
VIB Switch Laboratory
Occupancy
ATOM 625 C ILE A 77 -11.322 28.374 -1.179 1.00 28.77 C
ATOM 626 O ILE A 77 -11.946 29.453 -1.112 1.00 28.84 O
ATOM 627 CA AILE A 77 -11.432 27.329 -0.087 0.70 28.15 C
ATOM 628 CB AILE A 77 -12.918 26.874 0.087 0.70 28.64 C
ATOM 629 CG1AILE A 77 -13.042 25.758 1.141 0.70 26.75 C
ATOM 630 CG2AILE A 77 -13.516 26.421 -1.241 0.70 28.13 C
ATOM 631 CD1AILE A 77 -13.378 26.302 2.501 0.70 26.47 C
ATOM 632 CA BILE A 77 -11.423 27.327 -0.082 0.30 28.50 C
ATOM 633 CB BILE A 77 -12.874 26.775 0.117 0.30 28.79 C
ATOM 634 CG1BILE A 77 -13.519 26.423 -1.227 0.30 28.62 C
ATOM 635 CG2BILE A 77 -13.748 27.739 0.916 0.30 28.40 C
ATOM 636 CD1BILE A 77 -14.720 25.518 -1.100 0.30 28.69 C
ATOM 637 N ARG A 78 -10.521 28.048 -2.183 1.00 28.70 N
ATOM 638 CA ARG A 78 -10.258 28.952 -3.268 1.00 28.47 C
ATOM 639 C ARG A 78 -10.857 28.469 -4.584 1.00 28.22 C
2VWC
VIB Switch Laboratory
Atomic B-factors
• Value which determines the precision of an atom’s given position
• Atoms with the largest B-factors will have the largest positional uncertainty
• Indication of mobility of an atom
• 0 < B < 20: Atom is most likely OK • 20 < B < 40: Atom is probably OK, but positional
errors up to 0.5 Ångstrom are normal • 40 < B < 60: Atom is probably reasonably OK, but be
careful, because positional errors up to 1.0 Ångstrom can be observed
• B > 60: Atom is not likely to be within 1.0 Ångstrom from where you see it
• B around 100: Atom is guaranteed not within 1.0 Ångstrom from where you see it
VIB Switch Laboratory
B-factor
www.YASARA.org
Low
High
VIB Switch Laboratory
Resolution (Angstrom)
• Level of detail that can be observed in the electron density map
• The greater the disorder in the crystal, the lower the resolution (proportional to the protein size)
3.0A 2.0A 1.2A
VIB Switch Laboratory
R-factor
• The difference between the observed and computed diffraction pattern
• A measure of how well the refined structure predicts the observed data
• Higher values mean less agreement
• 0.40-0.60: very unreliable• 0.20 seems to be the
standard threshold
electron density map
VIB Switch Laboratory
NMR Structure determination
VIB Switch Laboratory
NMR models components
• In solution• study protein dynamics• solve protein structures that are difficult to crystallize
• Nuclear Overhauser Effect or NOE• intensities of signal peaks correspond to short inter-atomic distances
between spatially close protons (NOE distances)
• NOE constraints are known with low precision. E.g. NOEs are binned 2.5-4.0, 4.0-5.5, and 5.5-7.0 Angstrom
• Multiple models are generated that are consistent with the distance and angle constraints using e.g. molecular dynamics: the NMR ensemble
• Take average or best model from ensemble for PDB deposition, or just deposit a selected ensemble of superposed structures
VIB Switch Laboratory
Structure superposition (root mean
square distance)
RMSD=∑i d i2n
n = number of atomsdi = distance between 2 corresponding atoms i in 2 structures
The more atoms superpose on each other, the lower the RMSD
Unit of RMSD => Ångstroms
identical structures => RMSD = “0”similar structures => RMSD is small (1 – 3 Å)distant structures => RMSD > 3 Å
However, care has to be taken as RMSD is length dependent and dominated by outliers:• comparison of two short peptide structures can result in a small RMSD even if their structure is visibly different.
• very similar structures can have a bad RMSD due to a short part of the structures that is very different (loops)
• Insertions and deletions are not implemented in the RMSD calculation, since we only look at equivalent atoms/residues (see figure)
VIB Switch Laboratory
NMR ensemble RMSD (root mean square
distance)
• Superpose the NMR models
• Calculate RMSD of local regions and also whole models
• Regions with high RMSD are less well defined by the data
VIB Switch Laboratory
Structural data is stored in the Protein Data Bank (PDB)
http://www.pdb.org
Protein Data Bank (PDB)
VIB Switch Laboratory
©CMBI 2009©CMBI 2009
Protein Data Bank (PDB)
•Databank for 3-dimensional structures of biomolecules:
• Protein• DNA• RNA• Ligands
•Obligatory deposit of coordinates in the PDB before publication
•~ 65000 entries (April 2010) ( ~27000 “unique” structures)
• PDB file is a keyword-organised flat-file (80 column)1) human readable2) every line starts with a keyword (3-6 letters)3) platform independent
VIB Switch Laboratory
©CMBI 2009
PDB important records (1)
•PDB nomenclatureFilename= accession number= PDB CodeFilename is 4 positions (often 1 digit & 3 letters, e.g. 1CRN.pdb)
•HEADERdescribes molecule & gives deposition dateHEADER PLANT SEED PROTEIN 30-APR-81 1CRN
•CMPNDname of moleculeCOMPND CRAMBIN
•SOURCEorganismSOURCE ABYSSINIAN CABBAGE (CRAMBE ABYSSINICA) SEED
VIB Switch Laboratory
©CMBI 2009
PDB important records (2)
•SEQRESSequence of protein; be aware: Not always all 3d-coordinates are present for all the amino acids in SEQRES!!SEQRES 1 46 THR THR CYS CYS PRO SER ILE VAL ALA ARG SER ASN PHE 1CRN 51SEQRES 2 46 ASN VAL CYS ARG LEU PRO GLY THR PRO GLU ALA ILE CYS 1CRN 52SEQRES 3 46 ALA THR TYR THR GLY CYS ILE ILE ILE PRO GLY ALA THR 1CRN 53SEQRES 4 46 CYS PRO GLY ASP TYR ALA ASN 1CRN 54
•SSBONDdisulfide bridgesSSBOND 1 CYS 3 CYS 40
SSBOND 2 CYS 4 CYS 32
VIB Switch Laboratory
©CMBI 2009
PDB important records (3)
and at the end of the PDB file the “real” data:
ATOMone line for each atom with its unique name and its x,y,z coordinatesATOM 1 N THR 1 17.047 14.099 3.625 1.00 13.79 1CRN 70ATOM 2 CA THR 1 16.967 12.784 4.338 1.00 10.80 1CRN 71ATOM 3 C THR 1 15.685 12.755 5.133 1.00 9.19 1CRN 72ATOM 4 O THR 1 15.268 13.825 5.594 1.00 9.85 1CRN 73ATOM 5 CB THR 1 18.170 12.703 5.337 1.00 13.02 1CRN 74ATOM 6 OG1 THR 1 19.334 12.829 4.463 1.00 15.06 1CRN 75ATOM 7 CG2 THR 1 18.150 11.546 6.304 1.00 14.23 1CRN 76ATOM 8 N THR 2 15.115 11.555 5.265 1.00 7.81 1CRN 77ATOM 9 CA THR 2 13.856 11.469 6.066 1.00 8.31 1CRN 78ATOM 10 C THR 2 14.164 10.785 7.379 1.00 5.80 1CRN 79ATOM 11 O THR 2 14.993 9.862 7.443 1.00 6.94 1CRN 80
VIB Switch Laboratory
PDB entry
VIB Switch Laboratory
PDB entry
VIB Switch Laboratory
PDB entry
VIB Switch Laboratory
©CMBI 2009
Structure Visualization
Structures from PDB can be visualized with:
1. YASARA (http://www.yasara.org)
2. SwissPDBViewer (http://spdbv.vital-it.ch/)
1. PyMOL (http://www.pymol/org)
1. Chimera (http://www.cgl.ucsf.edu/chimera )
VIB Switch Laboratory
YASARA View nomenclature
Atom Residue = any continuous stretch of atoms sharing the same residue name, residue number and molecule name
Molecule = any continuous stretch of residues sharing the same molecule name (PDB calls this a CHAIN)
Object = a collection of molecules and additional items
VIB Switch Laboratory
Standard atom colors
• C = cyan• O = red• N = blue• H = white• S = green
VIB Switch Laboratory
Atom nomenclature
Cα
Cβ
Cγ
Oγ
N
N
O
Cα
Cβ
C
C
Cγ
Cδ1Cδ2
OT1
OT2
N-term
C-term
VIB Switch Laboratory
FoldX: a molecular design toolkit
• Predict the effect of point mutation on the protein stability
• Predict the 3D structure of a sequence: homology modeling
VIB Switch Laboratory
Predict effect of point mutation
• FoldX is an empirical force field• It is validated with calorimetric experiments• E.g. If such an experiment concludes that breaking a
hydrogen bond costs 1.5 kcal/mol, FoldX uses this knowledge rather than using theoretical physics equations
• FoldX compares WT and mutant for:• Hydrogen bonds, electrostatics, Van der Waals clashes
and contacts, entropy, desolvation, ...
• FoldX energies• Energy of a single molecule is meaningless• The difference in energy of two molecules (such as WT
and a point mutant) approaches realistic values
VIB Switch Laboratory
Predict effect of point mutation
• FoldX calculates the stability of WT and MT and makes the difference (net effect of mutation):• ΔGMT-ΔGWT = ΔΔGmutation
• If ΔΔGmutation
• > 0 : mutation is bad for stability• < 0 : mutation is good for stability
• FoldX error margin is 0.5 kcal/mol, so changes within this margin are meaningless
VIB Switch Laboratory
Introduction to homology modeling
• Goal: predict a structure from its sequence with an accuracy that is comparable to the best results achieved experimentally (X-Ray)
• Protein modeling is the only way to obtain structural information when experimental techniques (x-ray, NMR, EM) fail
VIB Switch Laboratory
Homology Modeling
VIB Switch Laboratory
Principles of Homology Modeling
• Search for a sequence with a known structure that is very similar to the sequence with the unknown structure. Build model using known structure as template
• The structure of a protein is uniquely determined by its amino acid sequence
• Structure is more conserved than sequence• Similar sequences adopt nearly exact same
structure• Distantly related sequences can still fold into
a similar structure
VIB Switch Laboratory
Sequence similarity rule
• Rost (1999) modeled lots of structures and compared them to the real ones in the PDB
• Derived precise limits for homology modeling• This rule tells you whether a model will be reliable or unreliable
VIB Switch Laboratory
FoldX plugin for YASARA
VIB Switch Laboratory
Acknowledgements
• Gert Vriend, Radboud Universiteit Nijmegen, NL (www.cmbi.ru.nl)
• Sander Nabuurs, Lead Pharma, Nijmegen, NL
• Greet De Baets, VIB Switch Laboratory