Top Banner
A review of the current protein structure prediction methods and software BASIC PROTEIN STRUCTURE PREDICTION FOR THE BIOLOGIST Marius MIHĂŞAN Faculty of Biology, Biochemistry and Molecular Biology lab, Room B228 E-mail: [email protected]
13

BASIC PROTEIN STRUCTURE PREDICTION FOR THE BIOLOGIST€¦ · Protein structure prediction methods: Folding as a purely physics process laws of physics Based on the first principles

Jun 28, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: BASIC PROTEIN STRUCTURE PREDICTION FOR THE BIOLOGIST€¦ · Protein structure prediction methods: Folding as a purely physics process laws of physics Based on the first principles

A review of the current protein structure prediction methods and software

BASIC PROTEIN STRUCTURE PREDICTION FOR THE BIOLOGIST

Marius MIHĂŞAN

Faculty of Biology, Biochemistry and Molecular Biology lab, Room B228E-mail: [email protected]

Page 2: BASIC PROTEIN STRUCTURE PREDICTION FOR THE BIOLOGIST€¦ · Protein structure prediction methods: Folding as a purely physics process laws of physics Based on the first principles

Why do we need protein structure prediction?

Protein 3D structure Protein function

Protein and drug design

BiotechnologyMedicine

Benson, D. A., et al., (2015) GenBank. Nucleic Acids Res. 43, D30–5.

l Genome-scale sequencing projects - 187 066 846 sequences

l Experimental protein structure determination – 111 956 structures

Let's use computers to fill the gap

Anfinsen’s thermodynamichypothesis – protein folding is apurely physical process anddepends only on the specificamino acid sequence of theprotein and the surroundingsolvent.

X-ray, NMR, EM

Page 3: BASIC PROTEIN STRUCTURE PREDICTION FOR THE BIOLOGIST€¦ · Protein structure prediction methods: Folding as a purely physics process laws of physics Based on the first principles

The protein structure prediction problem:

Protein folding is a purely physical process,

solvent (water or lipid bilayer) salts concentrationpH

temperature molecular chaperones (GroeL,HSP)

Given a known primary sequence of amino acids, predict its native, or folded, state in 3-dimensional space

Adds several layers of complexity to

structure prediction“holy grail of molecular biology”

“the second half of the genetic code”

but depends on:

Page 4: BASIC PROTEIN STRUCTURE PREDICTION FOR THE BIOLOGIST€¦ · Protein structure prediction methods: Folding as a purely physics process laws of physics Based on the first principles

Protein structure prediction methods:

Folding as a purely physics process

laws of physics

Based on the first principles of physics, no structural data or

template required

ab-initio methods

Similar sequences will have similar folds

theory and laws of protein evolution

Uses one or more proteins of known structures as templates to build the structure of the unknown

protein

homology modeling

protein threading

Page 5: BASIC PROTEIN STRUCTURE PREDICTION FOR THE BIOLOGIST€¦ · Protein structure prediction methods: Folding as a purely physics process laws of physics Based on the first principles

1. Homology modeling = Comparative modeling

III. the proteins from the same family have a highly

conserved structure

II. most protein pairs with more than 30% identical residues

were found to be structurally similar

I. the 3D structure of proteins is more conserved than their

amino acid sequences

The function and not the sequence per see is conserved in protein evolution

Steps in modeling the 3D structure of a target sequence:

(c) modeling of structurally conserved regions using known templates;(d) modeling side chains and loops which are different than the templates;

(e) refining and evaluating the quality of the model through conformational sampling

(a) identification of related sequences of known structure = template structures;BLAST searches in PDB or wwPDB

(b) aligning of the target sequence to the template structures;

Page 6: BASIC PROTEIN STRUCTURE PREDICTION FOR THE BIOLOGIST€¦ · Protein structure prediction methods: Folding as a purely physics process laws of physics Based on the first principles

The probability of finding a related known structure for a randomly selected sequence from a genome ranges from 30% to 65%

Accuracy of predictions – depends on the target vs template sequence similarity:

l Identity > 40% - 90% of main-chain atoms can be modeled with an RMSD error of about 1 Å - predictions are of very good to high quality as accurate as low-resolution X-ray structures

l Identity between 30% - 40% - 80% of main-chain backbone atoms can be predicted to RMSD 3.5 Å, the rest of the residues are modeled with larger errors.

l Identity < 30% -homologous structures are hard to find, alignments and models can be generated, but their significance is questionable – the twilight zone of protein sequence alignments

Page 7: BASIC PROTEIN STRUCTURE PREDICTION FOR THE BIOLOGIST€¦ · Protein structure prediction methods: Folding as a purely physics process laws of physics Based on the first principles

2. Protein threading = Fold recognition

similar sequences implies similar structures but similar structures are often found for proteins with low or no sequence similarity

the actual number of different folded protein structures is limited -a few hundred to a few thousand

90% of the new structures submitted to the PDB in the

past three years have similar structural folds to ones already in the PDB

"Threading" implies placing (aligning) amino-acids one by one in the target sequence to a position in the template structure, and evaluating how well the target fits the template:

http://www.gnf.org/assets/001/23013.jpg

(a) construction of the database with template folds;selecting protein structures from databases such as PDB, FSSP, SCOP, or CATH,

(b) threading the target with each template and selecting how well the target fits the template; KFINDERESYKQLTWTDTRLATGSWSLAKDFPGSPAWNGKAVGGTATFWTG

(c) construct the model by placing the identified folds on their position on the target sequence;

Steb b is computing intensive butthreading works for low sequence identity (i.e. <25%)

Page 8: BASIC PROTEIN STRUCTURE PREDICTION FOR THE BIOLOGIST€¦ · Protein structure prediction methods: Folding as a purely physics process laws of physics Based on the first principles

3. Ab initio methods = de novo methods

the native structure corresponds to the global free energy minimum which could be calculated

I. No template II. Computing intensive III. Low resolution and accuracyRosetta@home And T0283 - two years and approximately 70,000 home computers

These methods are applicable for:l proteins with no structural homologl sequences of less than approximately 100 residues

Page 9: BASIC PROTEIN STRUCTURE PREDICTION FOR THE BIOLOGIST€¦ · Protein structure prediction methods: Folding as a purely physics process laws of physics Based on the first principles

What to use: programs, servers and meta-servers

Programs: What If; SegMod/ENCAD; Biskit

Servers: Meta - servers:

Con's- ''exotic'' operating systems - no GUI, only terminal commands- skills in different programming languages required for scripts- access to high computing power

Pro's- full control of the method used

Swiss-Modell; Phyre2

Pro's- computations are done elsewhere- results sent by e-mailCon's- each server uses only one method of prediction with its corresponding flows

Lomets; Meta-PP

Pro's- computations are done elsewhere- results sent by e-mail- the results from several different servers are compared and ranked

Num

ber

of c

itat

ions

Number of citations for some of the most common docking programs (italics) and servers (bold type), analyzed from the ISI Web of Science

Trends in the number of citations per year for some of the most common dockingprograms and servers, analyzed from the ISI Web of Science (2014)

Page 10: BASIC PROTEIN STRUCTURE PREDICTION FOR THE BIOLOGIST€¦ · Protein structure prediction methods: Folding as a purely physics process laws of physics Based on the first principles

Selected programs and servers

Page 11: BASIC PROTEIN STRUCTURE PREDICTION FOR THE BIOLOGIST€¦ · Protein structure prediction methods: Folding as a purely physics process laws of physics Based on the first principles

Which one to chose?

1. Check out CASP (Critical Assessment of protein Structure

Prediction) experiments for the latest ranking of tools available http://predictioncenter.org/

2. Check the sequence similarity of your target with other known structures:

l if at least 40% similar – homology modelingl less than 40% - threadingl no homolog found – ab initio

3. Check the quality of the generated model by:l assessing the stereochemistry (bonds, bond

angles, dihedral angles etc) with programs such as PROCHECK (Laskowski et al., 1993), WHAT-IF (Vriend, 1990)and WHAT-CHECK (Hooft et al., 1996)

l Calculating the pseudo energy profile of a model with PROSA (Sippl, 1995), Verify3D (Eisenberg et al., 1997) and QMEAN (Benkert et al., 2008).

Page 12: BASIC PROTEIN STRUCTURE PREDICTION FOR THE BIOLOGIST€¦ · Protein structure prediction methods: Folding as a purely physics process laws of physics Based on the first principles

Applications of structure predictions

Helpfulness depends on accuracy

Comparative modeling

RMSD from the experimental structure

1–2 Å

Threading 2–4 Å

errors mainly occurring in the loop regions- identification of the spatial locations of functionally important residues- interpret mutagenesis experiments

Ab initio methods >4 Å

accuracy of medium-resolution NMR or low-resolution X-ray structures- docking- designing ligands- designing mutants- identifying active and binding sites

- domain boundary identification- topology recognition- family/superfamily assignment.

Accuracy

Page 13: BASIC PROTEIN STRUCTURE PREDICTION FOR THE BIOLOGIST€¦ · Protein structure prediction methods: Folding as a purely physics process laws of physics Based on the first principles

Take-away message

1. Servers make protein prediction available for the masses

2 . The choice of one or another method still depends on the protein sequence, as well as the expected quality of the result.

Only a sequence and an e-mail address is required !!!!

Presentation and article available at: [email protected]