Within Structural Bioinformaticsico2s.org/summer_school/slides/eran_eyal.pdf · Within Structural Bioinformatics Plant Bioinformatics, Systems and Synthetic Biology Summer School

1

Within Structural Bioinformatics

Plant Bioinformatics, Systems and Synthetic Biology Summer SchoolUniversity of Nottingham, UK

July 2009

Eran EyalCancer Research Center

Sheba Medical CenterTel Hashomer

Israel

2

VAL LEU SER PRO ALA ASP LYS THR ASN VAL LYS ALA ALATRP GLY LYS VAL GLY ALA HIS ALA GLY GLU TYR GLY ALA GLU ALA LEU GLU ARG MET PHE LEU SER PHE PRO THR THRLYS THR TYR PHE PRO HIS PHE ASP LEU SER HIS GLY SER ALA GLN VAL LYS GLY HIS GLY LYS LYS VAL ALA ASP ALA LEU THR ASN ALA VAL ALA HIS VAL ASP ASP MET PRO ASN ALA LEU SER ALA LEU SER ASP LEU HIS ALA HIS LYS LEU

Sequence

StructureStructure

Function Dynamics

Drug

3

•Databases of 3D structures of macromolecules

•Structural alignment

•Structural classification

•Secondary structure prediction

•Tertiary structure prediction

•Molecular docking

•Dynamics

Structural Bioinformatics

4

The structural data – where, what

Description of the databases

Source of the data

Quality of the data

How to explore and query the data

•Databases•Structural alignment•Structural classification•Secondary structure prediction•Tertiary structure prediction•Molecular docking•Visualization•Dynamics

5

http://www.rcsb.org/http://www.wwpdb.org//

Source of data:

•Crystal structures •NMR models•Other

The PDB database is the main repository for the processing and distribution of 3-D biological macromolecular structures

6

7

Data Source

X-Ray Crystallography

Clone/Express/PurifyCrystallize

X-Ray diffraction data+ Solve phase problem

Interpret electrondensity map

Coordinates of atoms in protein molecule

8

Data Source

NMR Spectroscopy

NOESY experimentinformation about spatially-closed atoms

list of distance constraints + dihedral angles + …

multiple models of protein structure

J-couplings

Coordinates of atoms in protein molecule

9

X-ray

(pdb 1ert)

NMR

(pdb 3trx)

Human thioredoxin structure determined by X-ray and NMR

superimposition

10

NMR X-ray crystallography

Atomic resolution

Hydrogen

Molecule size

Dynamics

Membrane proteins

Good

No restriction Small proteins

Snapshot Multi models

Problematic

Reasonable

Rarely determined Determined

Procedure long long

Problematic

11

File Format

Coordinate section

Header section

12

How to search in the PDB?

The OCA browser developed in the WIS by Jaime Priluskyis one of the best interfaces to the PDB.

Entries can be retrieved by variety of criteria

http://bip.weizmann.ac.il/oca-bin/ocamain

13

14

Problems in the PDB database

• Missing data

• Format problems – residue numbers

• Quality of data

• Data is often not independent

15

Diffraction pattern Brag Planes separation

Poor Resolution

Good Resolution

3Å resolution

2Å resolution

16

R-factor

Original diffraction pattern

Model

Calculated diffraction pattern based on the modelR factor measures how different

is the originated diffraction map from a recalculated one based on

a putative model

17

TYROSINE-PROTEIN KINASE color by B-factor

B-factor

A measure of the uncertainty in the position of individual atoms

18

Structural alignment

Structures are more conserved throughout evolution than sequences. Two homologous proteins have the same overall structure. It is possible that 2 proteins without detectable sequence similarity will have the same structure.

In the twilight zone of sequence similarity, structural alignment might help to correctly determine the relations between 2 proteins

Structural similarity is more sensitive method than sequence alignment to determine protein function

•Databases•Structural alignment•Structural classification•Secondary structure prediction•Tertiary structure prediction•Molecular docking•Dynamics


19

1 2 3 4 5 6 7 8 9 10 11 12 13 14 PHE ASP ILE CYS ARG LEU PRO GLY SER ALA GLU ALA VAL CYS PHE ASN VAL CYS ARG THR PRO --- --- --- GLU ALA ILE CYS PHE ASN VAL CYS ARG --- --- --- THR PRO GLU ALA ILE CYS

20Kuttner et al,. 2003

21

Superposition Structural alignment

There are two types of problems related to structural comparison: • Superposition problem• Structural alignment problem

In the superposition problem we know in advance the correspondence between the points in the two structures we want to align

22

• sequence

• Type and number of secondary structures (sheets, helices)

• Structural arrangement of secondary structures

• Structural attributes of individual amino acids

• Distances between amino acids in the protein

What properties of the protein might be used to detect structural similarity to other proteins ?

23

RMSD = root mean square deviation

RMSD = Σ (Xi1-Xi2)2+(Yi1-Yi2)2+(Zi1-Zi2)2

N

The standard way to quantify similarity between molecules is to measure the positional deviation of the atoms - RMSD

This method amplifies large deviation in local region of the protein

24

http://www.biochem.ucl.ac.uk/cgi-bin/cath/GetSsapRasmol.pl

SSAP

25

• ••

•

• •

••

The method includes 2 steps of dynamic programming. Initial step to obtain the score between each pair of amino acids, and second step in which the best overall alignment in the protein is determined

• ••

•

• •

••

26

http://123d.ncifcrf.gov/sarf2.html

SARF2

http://carten.gmd.de/ToPign.html

An algorithm to find structural similarity based on comparison of secondary structures.

As such it might be used to compare proteins only, and only proteins with minimal content of defined secondary structures

27

every secondary structure element is represented by a vector

Single SSE does not give any information about the structure of the protein. Two SSEs or more are therefore required.

28

DALI: Search for common 3D-pattern of Cα distance maps3-helix-bundle pairwise 3D alignment

http://www.ebi.ac.uk/dali/

29

Structural classificationStructural classification

•Using structural alignment it is feasible to construct a classification system

•Classification helps us understand relations between remote proteins

•Convergence evolution in structures can often hint to the function of the protein


30

ClassificationClassification databasesdatabases

http://www-lmmb.ncifcrf.gov/~nicka/sarf2.html/

SARF

http://www.ncbi.nlm.nih.gov/Structure/VAST/vast.shtml

VAST

http://cl.sdsc.edu/ce.htmlCE

http://www.compbio.dundee.ac.uk/3Dee/3Dee

http://www.ncbi.nlm.nih.gov/Structure/MMDB/mmdb.shtml

MMDB

http://www-cryst.bioc.cam.ac.uk/data/align/

HOMSTRAD

http://scop.mrc-lmb.cam.ac.uk/scop/SCOP

http://www.biochem.ucl.ac.uk/bsm/cath_new/index.html

CATH

http://www2.ebi.ac.uk/dali/fssp/FSSP

31

CATHCATH

SemiSemi--automatic!automatic!Class – 2D composition – automatic.4 classes: α, β, αβ, FSS: few 2D structures.Architecture – manual! Shape created by orientation of 2D units.Topology – secondary structures connectivity.Homologous superfamily – high structural and functional similarity. Sequence similarity

http://www.cathdb.info/

32

CATH of 1hhoCATH of 1hho

33

SCOP SCOP –– Structural Structural Classification of ProteinsClassification of Proteins

Manual inspection of automatic output

1. 2D content (class)2. Structural similarity (fold)3. Remote homology (superfamily/family). 4. Close homology (family)

34

Structure Prediction

A-C-H-Y-T-T-E-K-R-G-G-S-G-T-K-K-R-E-A

H-H-H-H-H-H-H-H-O-O-O-O-O-S-S-S-S-S-S

Secondary structure prediction Tertiary structure prediction

•Databases•Structural alignment•Structural classification•Secondary structure prediction•Tertiary structure prediction•Molecular docking•Dynamics


35

36

http://www.predictprotein.org/

37

Why make a structural model for your protein ?

• The structure can provide clues on the function

• With a structure it is easier to guess the location of functional sites and to learn on the function

• We can do docking experiments (both with other proteins and with small molecules)

• With a structure we can plan more precise experiments in the lab


38

Building by homology (Homology modeling)

-

-

-

G

-

-

-

Y

-

-

-

M

A

A

A

A

K

S

T

A

A

G

G

G

Y

F

F

Y

L

E

D

A

V

V

V

V

L

V

I

L

S

E

D

S

alignment with proteins of known structure

structural model

39

Fold recognition (Threading)

sequence:

+known protein folds

SLVAYGAAM

structural model

40

Ab initio

sequence

SLVAYGAAM

structural model

41

There are hundreds of thousands of protein sequences but only several thousands protein folds

For every second protein that we randomly pick from the structural data base there is “close” homolog (identity > 30%). This homolog almost always has the same fold.

In the current projects for experimental determination of protein structures, priority is given to determine structures of protein without homologs in the structural databases (‘structural genomics’)

We believe that in several years we will have almost all the basic folds

Building by homology

42

Stevens RC., Yokoyama S., Wilson IA.,

October, 2001, Science 294, 89-93

43

Find proteins with known structure whichare similar to your sequence

build alignment

Construct structural model

Check the model

Done

44

45

Swiss-Modelhttp://www.expasy.ch/swissmod/SWISS-MODEL.html

http://salilab.org/modeller/

Modeller

Advanced program for homology modeling.Implemented in several popular modeling packages such as InsightII

“Quick and dirty”The easiest way to do homology modeling

46

Threading (fold recognition)

The input sequence is threaded on many different folds from library of known folds

Using scoring functions we get a score for the compatibility between the sequence and each structure

Statistically significant score tells that the input protein adopts similar 3D structure to that fold

47

This method is less accurate but could be applied for more cases

When the “real” fold of the input sequence is not represented in the structural database we can never get correct solution by this method

The most important part is the accuracy of the scoring function. The scoring function is the major difference between different programs for fold recognition

48

Contact potentials

This method is based on predefined tables which include pseudo-energetic scores to each pair-wise interaction of two amino acids.

For each given conformation to be evaluated, a distance matrix can be constructed.

For each pair of amino acids which are close in space the interaction energy is summed. The total is the indication for the fitness of the sequence into that structure

•• • • • • • • • • • • • •

Amino acid index

Am

ino

acid

inde

x

••••

1

N1 N

•••• • •

••

49

Input:

sequence

H bond donorH bond acceptorGlycinHydrophobic

Library of folds of known proteins

50

S=20S=5S=-2

Z=5Z=1.5Z= -1

H bond donorH bond acceptorGlycinHydrophobic

51

Ab initio methods for modeling

This field is of great theoretical interest. Here there is no use of sequence alignments and no direct use of known structures

The basic idea is to build empirical function that simulates real physical forces and potentials of chemical contacts

If we will have perfect scoring function and we will be able to scan all the possible conformations, then we will be able to detect the correct fold

52

Algorithms for Ab initio prediction include:A. Searching procedure that scans many possible structures (conformations)B. Scoring function to evaluate and rank the structures

Due to the large search space, heuristic methods are usually applied

The parameters in the searching procedure are the dihedral angles which specify the exact fold of the polypeptide chain

53

When there is high similarity between the built protein and the templates, construction of the side chains is done using the template structures

Side chain construction

Without such similarity the construction can be done using rotamer libraries

A compromise between the probability of the rotamer and its fitness in specific position determines the score. Comparing the scores of all the rotamers for a given amino acid determines the preferred rotamer.

54Phe

Asn

Conformation - a given setof dihedral angle which defines a structure.

Rotamer - energetically favourable conformation.

SER 59.6 1.0SER -62.5 26.4SER 179.6 32.6

Example of a rotamer library:

TYR 63.6 90.5 21.0TYR 68.5 -89.6 16.4TYR 170.7 97.8 13.3TYR -175.0 -100.7 20.0TYR -60.1 96.6 10.0TYR -63.0 -101.6 19.3

χ1 χ2 probability

55http://ignmtest.ccbb.pitt.edu/cgi-bin/sccomp/sccomp1.cgi

56

57

Model evaluation

After the model is built we can check its validity by various ways. We can check that the model has a reasonable shape and that it is usually obey geometric constraints.

If the model turns out to be bad, it is necessary to repeat several steps of the model building

58

59

We can easily assess homology modeling procedures by building models for proteins which have already solved structure and compare between the model and the native structure

It is always possible that information from the native structure will be used in direct or indirect ways for model building

A more objective test is prediction of structures before they are publicly distributed (this is the idea of the CASP competitions)

60

According to the molecules involved:•Protein-Ligand docking•Protein-Protein docking

Specific docking algorithms usually designed to deal with one of these problems but not with both (different contact area, flexibility, level of representation, etc.)

Docking: finding the binding orientation of two molecules with known structures


61

Why docking?

• Understanding interactions, roles of specific amino acids, design of mutations and changes of activity.

• Prediction affinities

• Drug design

62

Ligand-Protein docking

Finding the place and the orientation of the interactions

The general problem includes a search for the location of the binding site and a search to figure out the exact orientation of the ligand in the binding site. A program that do both makes a Global docking

Sometimes the location of the binding site is known. In this case we only need to orient the ligand in the binding site. In this case the problem is called Local docking

Global docking is more demanding in terms of computational time and the results are less accurate.

63

Global dockingLocal docking

64

Rigidity vs flexibility

• This assumption is problematic and was proven to be wrong in many cases

• Other methods try to handle the flexibility problem indirectly or at least to “minimize the damage” of not incorporating flexibility.

• Most of the early algorithms assumed that the docked molecules do not change conformations. This assumption allows to treat the molecules as rigid bodies, making the algorithm simpler and faster

• New algorithms try to face the flexibility problems.

• Docking procedures that perform rigid body search are termed rigid docking

• Docking procedures that consider possible conformational changes are termed flexible docking

65

Bound and unbound docking

In bound docking the goal is to reproduce a known complex where the starting coordinates of the individual molecules are taken from the crystal of the complex

In the unbound docking, which is a significantly more difficult problem, the starting coordinates are taken from the unbound molecules. This is unfortunately a more realistic problem.

66

Components of the problem

Algorithms to dock molecules need:

A. System representationB. Searching procedure C. Scoring functionD. Clustering procedure

The parameters of the problem for docking of 2 rigid bodies are 3 angles (rotations) and 3 distances (translations)

67

x

y z

(0,0,0,0,0,0)

(0,-1,0,0,0,0)

(0,0,1,0,0,0)

68

x

y z

(0,0,0,0,0,30)

69

x

y z

(0,0,0,30,0, 0)

70

Usually the ligand is not rigid and few other parameters are required

Np = 3 + 3 + Nfb

Number of parameters needed to fully describe ligand position

Position Orientation Number of flexible bonds

71

receptor ligand

72

ספריית קבוצות:כימיות

1

2

3

4 8

7

6

5

73

Chemical library

1

2

3

4 8

7

6

5

1 045 6 090 3

1 045 7 090 3

1 045 7 180 3

74

Visualization – Molecular graphics

What do we need?

• Rotation & translation

• Color specific parts of the molecule

• Labeling of residues and atoms

• Geometrical measurements (distances & angles)

• Schematic representation:

Atoms/Bonds/Secondary structures, …

• Molecular surfaces

• Compare structures

• Saving pictures


75

Representation of molecules (1)

Stick-model Ball & Stick Space-filled model

Ball size: 0Stick size: 0.2

Ball size: 0.4Stick size: 0.2

Ball size: 0.8Stick size: 0

76

Representation of molecules (2)

Backbone Schematic Surface

only connectionsbetween C-alpha atoms

helix – cylinderstrand – arrow

color indicate electrostatic potentials

77

http://www.rcsb.org/pdb/static.do?p=software/software_links/molecular_graphics.html

78

79

Dynamics of proteins

• Dynamics of proteins is clearly related to their function.

• Understanding the relation between the two is a main challenge in the field of biophysics

• Molecular Dynamics provides a way to conduct non-equilibrium simulations but only for short time scales (10-7 s)

• Normal Mode Analysis provides a way to analyze equilibrium motion for longer time scales


80

Times and Amplitude scales

Functionality examples

Type of motion

ms - h

(10-3 - 104 s)

more than 10 Å

•Hormone activation

•Protein functionality

Global Motions:

•Heix-coil transition

•Folding/unfolding

•Subunit association

μs - ms

(10-6 - 10-3 s)

5 - 10 Å

•Hinge bending motion

•Allosteric transitions

Large Scale Motions:

•Domain motion

•Subunit motion

ns - μs

(10-9 - 10-6 s)

1 - 5 Å

•Active site conformation adaptation

•Binding specificity

Medium Scale Motions:

•Loop motion

•Terminal-arm motion

•Rigid-body motion (helices)

fs - ps

(10-15 - 10-12 s)

less than 1 Å

•Ligand docking flexibility

•Temporal diffusion pathways

Local Motions:

•Atomic fluctuation

•Side chain motion

Modified after: Becker & Watanabe (2001). Dynamic Methods. In Computational & Biochemistry & Biophysics (Edited by Becker et al.)

81

82

Thanks

•The organizers•Dr. Jaume Bacardit

You!

Within Structural Bioinformaticsico2s.org/summer_school/slides/eran_eyal.pdf · Within Structural Bioinformatics Plant Bioinformatics, Systems and Synthetic Biology Summer School

Documents