1 Within Structural Bioinformatics Plant Bioinformatics, Systems and Synthetic Biology Summer School University of Nottingham, UK July 2009 Eran Eyal Cancer Research Center Sheba Medical Center Tel Hashomer Israel 2 VAL LEU SER PRO ALA ASP LYS THR ASN VAL LYS ALA ALA TRP GLY LYS VAL GLY ALA HIS ALA GLY GLU TYR GLY ALA GLU ALA LEU GLU ARG MET PHE LEU SER PHE PRO THR THR LYS THR TYR PHE PRO HIS PHE ASP LEU SER HIS GLY SER ALA GLN VAL LYS GLY HIS GLY LYS LYS VAL ALA ASP ALA LEU THR ASN ALA VAL ALA HIS VAL ASP ASP MET PRO ASN ALA LEU SER ALA LEU SER ASP LEU HIS ALA HIS LYS LEU Sequence Structure Structure Function Dynamics Drug
41
Embed
Within Structural Bioinformaticsico2s.org/summer_school/slides/eran_eyal.pdf · Within Structural Bioinformatics Plant Bioinformatics, Systems and Synthetic Biology Summer School
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Within Structural Bioinformatics
Plant Bioinformatics, Systems and Synthetic Biology Summer SchoolUniversity of Nottingham, UK
July 2009
Eran EyalCancer Research Center
Sheba Medical CenterTel Hashomer
Israel
2
VAL LEU SER PRO ALA ASP LYS THR ASN VAL LYS ALA ALATRP GLY LYS VAL GLY ALA HIS ALA GLY GLU TYR GLY ALA GLU ALA LEU GLU ARG MET PHE LEU SER PHE PRO THR THRLYS THR TYR PHE PRO HIS PHE ASP LEU SER HIS GLY SER ALA GLN VAL LYS GLY HIS GLY LYS LYS VAL ALA ASP ALA LEU THR ASN ALA VAL ALA HIS VAL ASP ASP MET PRO ASN ALA LEU SER ALA LEU SER ASP LEU HIS ALA HIS LYS LEU
The PDB database is the main repository for the processing and distribution of 3-D biological macromolecular structures
6
7
Data Source
X-Ray Crystallography
Clone/Express/PurifyCrystallize
X-Ray diffraction data+ Solve phase problem
Interpret electrondensity map
Coordinates of atoms in protein molecule
8
Data Source
NMR Spectroscopy
NOESY experimentinformation about spatially-closed atoms
list of distance constraints + dihedral angles + …
multiple models of protein structure
J-couplings
Coordinates of atoms in protein molecule
9
X-ray
(pdb 1ert)
NMR
(pdb 3trx)
Human thioredoxin structure determined by X-ray and NMR
superimposition
10
NMR X-ray crystallography
Atomic resolution
Hydrogen
Molecule size
Dynamics
Membrane proteins
Good
No restriction Small proteins
Snapshot Multi models
Problematic
Reasonable
Rarely determined Determined
Procedure long long
Problematic
11
File Format
Coordinate section
Header section
12
How to search in the PDB?
The OCA browser developed in the WIS by Jaime Priluskyis one of the best interfaces to the PDB.
Entries can be retrieved by variety of criteria
http://bip.weizmann.ac.il/oca-bin/ocamain
13
14
Problems in the PDB database
• Missing data
• Format problems – residue numbers
• Quality of data
• Data is often not independent
15
Diffraction pattern Brag Planes separation
Poor Resolution
Good Resolution
3Å resolution
2Å resolution
16
R-factor
Original diffraction pattern
Model
Calculated diffraction pattern based on the modelR factor measures how different
is the originated diffraction map from a recalculated one based on
a putative model
17
TYROSINE-PROTEIN KINASE color by B-factor
B-factor
A measure of the uncertainty in the position of individual atoms
18
Structural alignment
Structures are more conserved throughout evolution than sequences. Two homologous proteins have the same overall structure. It is possible that 2 proteins without detectable sequence similarity will have the same structure.
In the twilight zone of sequence similarity, structural alignment might help to correctly determine the relations between 2 proteins
Structural similarity is more sensitive method than sequence alignment to determine protein function
1 2 3 4 5 6 7 8 9 10 11 12 13 14 PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO --- --- --- GLU ALA ILE CYS PHE ASN VAL CYS ARG --- --- --- THR PRO GLU ALA ILE CYS
20Kuttner et al,. 2003
21
Superposition Structural alignment
There are two types of problems related to structural comparison: • Superposition problem• Structural alignment problem
In the superposition problem we know in advance the correspondence between the points in the two structures we want to align
22
• sequence
• Type and number of secondary structures (sheets, helices)
• Structural arrangement of secondary structures
• Structural attributes of individual amino acids
• Distances between amino acids in the protein
What properties of the protein might be used to detect structural similarity to other proteins ?
23
RMSD = root mean square deviation
RMSD = Σ (Xi1-Xi2)2+(Yi1-Yi2)2+(Zi1-Zi2)2
N
The standard way to quantify similarity between molecules is to measure the positional deviation of the atoms - RMSD
This method amplifies large deviation in local region of the protein
The method includes 2 steps of dynamic programming. Initial step to obtain the score between each pair of amino acids, and second step in which the best overall alignment in the protein is determined
• ••
•
• •
••
26
http://123d.ncifcrf.gov/sarf2.html
SARF2
http://carten.gmd.de/ToPign.html
An algorithm to find structural similarity based on comparison of secondary structures.
As such it might be used to compare proteins only, and only proteins with minimal content of defined secondary structures
27
every secondary structure element is represented by a vector
Single SSE does not give any information about the structure of the protein. Two SSEs or more are therefore required.
28
DALI: Search for common 3D-pattern of Cα distance maps3-helix-bundle pairwise 3D alignment
There are hundreds of thousands of protein sequences but only several thousands protein folds
For every second protein that we randomly pick from the structural data base there is “close” homolog (identity > 30%). This homolog almost always has the same fold.
In the current projects for experimental determination of protein structures, priority is given to determine structures of protein without homologs in the structural databases (‘structural genomics’)
We believe that in several years we will have almost all the basic folds
Building by homology
42
Stevens RC., Yokoyama S., Wilson IA.,
October, 2001, Science 294, 89-93
43
Find proteins with known structure whichare similar to your sequence
Advanced program for homology modeling.Implemented in several popular modeling packages such as InsightII
“Quick and dirty”The easiest way to do homology modeling
46
Threading (fold recognition)
The input sequence is threaded on many different folds from library of known folds
Using scoring functions we get a score for the compatibility between the sequence and each structure
Statistically significant score tells that the input protein adopts similar 3D structure to that fold
47
This method is less accurate but could be applied for more cases
When the “real” fold of the input sequence is not represented in the structural database we can never get correct solution by this method
The most important part is the accuracy of the scoring function. The scoring function is the major difference between different programs for fold recognition
48
Contact potentials
This method is based on predefined tables which include pseudo-energetic scores to each pair-wise interaction of two amino acids.
For each given conformation to be evaluated, a distance matrix can be constructed.
For each pair of amino acids which are close in space the interaction energy is summed. The total is the indication for the fitness of the sequence into that structure
•• • • • • • • • • • • • •
Amino acid index
Am
ino
acid
inde
x
••••
1
N1 N
•••• • •
••
49
Input:
sequence
H bond donorH bond acceptorGlycinHydrophobic
Library of folds of known proteins
50
S=20S=5S=-2
Z=5Z=1.5Z= -1
H bond donorH bond acceptorGlycinHydrophobic
51
Ab initio methods for modeling
This field is of great theoretical interest. Here there is no use of sequence alignments and no direct use of known structures
The basic idea is to build empirical function that simulates real physical forces and potentials of chemical contacts
If we will have perfect scoring function and we will be able to scan all the possible conformations, then we will be able to detect the correct fold
52
Algorithms for Ab initio prediction include:A. Searching procedure that scans many possible structures (conformations)B. Scoring function to evaluate and rank the structures
Due to the large search space, heuristic methods are usually applied
The parameters in the searching procedure are the dihedral angles which specify the exact fold of the polypeptide chain
53
When there is high similarity between the built protein and the templates, construction of the side chains is done using the template structures
Side chain construction
Without such similarity the construction can be done using rotamer libraries
A compromise between the probability of the rotamer and its fitness in specific position determines the score. Comparing the scores of all the rotamers for a given amino acid determines the preferred rotamer.
54Phe
Asn
Conformation - a given setof dihedral angle which defines a structure.
After the model is built we can check its validity by various ways. We can check that the model has a reasonable shape and that it is usually obey geometric constraints.
If the model turns out to be bad, it is necessary to repeat several steps of the model building
58
59
We can easily assess homology modeling procedures by building models for proteins which have already solved structure and compare between the model and the native structure
It is always possible that information from the native structure will be used in direct or indirect ways for model building
A more objective test is prediction of structures before they are publicly distributed (this is the idea of the CASP competitions)
60
According to the molecules involved:•Protein-Ligand docking•Protein-Protein docking
Specific docking algorithms usually designed to deal with one of these problems but not with both (different contact area, flexibility, level of representation, etc.)
Docking: finding the binding orientation of two molecules with known structures
• Understanding interactions, roles of specific amino acids, design of mutations and changes of activity.
• Prediction affinities
• Drug design
62
Ligand-Protein docking
Finding the place and the orientation of the interactions
The general problem includes a search for the location of the binding site and a search to figure out the exact orientation of the ligand in the binding site. A program that do both makes a Global docking
Sometimes the location of the binding site is known. In this case we only need to orient the ligand in the binding site. In this case the problem is called Local docking
Global docking is more demanding in terms of computational time and the results are less accurate.
63
Global dockingLocal docking
64
Rigidity vs flexibility
• This assumption is problematic and was proven to be wrong in many cases
• Other methods try to handle the flexibility problem indirectly or at least to “minimize the damage” of not incorporating flexibility.
• Most of the early algorithms assumed that the docked molecules do not change conformations. This assumption allows to treat the molecules as rigid bodies, making the algorithm simpler and faster
• New algorithms try to face the flexibility problems.
• Docking procedures that perform rigid body search are termed rigid docking
• Docking procedures that consider possible conformational changes are termed flexible docking
65
Bound and unbound docking
In bound docking the goal is to reproduce a known complex where the starting coordinates of the individual molecules are taken from the crystal of the complex
In the unbound docking, which is a significantly more difficult problem, the starting coordinates are taken from the unbound molecules. This is unfortunately a more realistic problem.
66
Components of the problem
Algorithms to dock molecules need:
A. System representationB. Searching procedure C. Scoring functionD. Clustering procedure
The parameters of the problem for docking of 2 rigid bodies are 3 angles (rotations) and 3 distances (translations)
67
x
y z
(0,0,0,0,0,0)
(0,-1,0,0,0,0)
(0,0,1,0,0,0)
68
x
y z
(0,0,0,0,0,30)
69
x
y z
(0,0,0,30,0, 0)
70
Usually the ligand is not rigid and few other parameters are required
Np = 3 + 3 + Nfb
Number of parameters needed to fully describe ligand position