Top Banner
The Protein Data Bank (PDB) Page 287 is the principal repository for protein structure ablished in 1971 essed at http://www.rcsb.org/pdb or simply tp://www.pdb.org rently contains over 32,000 structure entities Updated 9/05
43

The Protein Data Bank (PDB)

Feb 02, 2016

Download

Documents

Gill Gill

The Protein Data Bank (PDB). PDB is the principal repository for protein structures Established in 1971 Accessed at http://www.rcsb.org/pdb or simply http://www.pdb.org Currently contains over 32,000 structure entities. Updated 9/05. Page 287. PDB content growth (www.pdb.org). - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Protein Data Bank (PDB)

The Protein Data Bank (PDB)

Page 287

• PDB is the principal repository for protein structures• Established in 1971• Accessed at http://www.rcsb.org/pdb or simply http://www.pdb.org• Currently contains over 32,000 structure entities

Updated 9/05

Page 2: The Protein Data Bank (PDB)

PDB content growth (www.pdb.org)

year

stru

ctur

es

Fig. 9.6Page 281

Page 3: The Protein Data Bank (PDB)

PDB holdings (September, 2005)

29,876 proteins, peptides1,338 protein/nucl. complexes1,500 nucleic acids13 carbohydrates32,727 total

Table 9-2Page 281

Page 4: The Protein Data Bank (PDB)

Protein Data Bank

Swiss-Prot, NCBI, EMBL

CATH, Dali, SCOP, FSSP

Fig. 9.10 Page 285

gateways to access PDB files

databases that interpret PDB files

Page 5: The Protein Data Bank (PDB)

Access to PDB through NCBI

Page 289

You can access PDB data at the NCBI several ways.

• Go to the Structure site, from the NCBI homepage• Use Entrez• Perform a BLAST search, restricting the output to the PDB database

Page 6: The Protein Data Bank (PDB)

Access to PDB through NCBI

Page 291

Molecular Modeling DataBase (MMDB)

Cn3D (“see in 3D” or three dimensions):structure visualization software

Vector Alignment Search Tool (VAST):view multiple structures

Page 7: The Protein Data Bank (PDB)

Fig. 9.15 Page 290

Page 8: The Protein Data Bank (PDB)

Fig. 9.15 Page 290

Page 9: The Protein Data Bank (PDB)

Fig. 9.16 Page 291

Page 10: The Protein Data Bank (PDB)

Fig. 9.16 Page 291

Page 11: The Protein Data Bank (PDB)

Fig. 9.16 Page 291

Page 12: The Protein Data Bank (PDB)

Fig. 9.16 Page 291

Page 13: The Protein Data Bank (PDB)

Fig. 9.16 Page 291

Page 14: The Protein Data Bank (PDB)

Fig. 9.17 Page 292

Page 15: The Protein Data Bank (PDB)

Access to structure data at NCBI: VAST

Page 294

Vector Alignment Search Tool (VAST) offers a varietyof data on protein structures, including

-- PDB identifiers-- root-mean-square deviation (RMSD) values to describe structural similarities-- NRES: the number of equivalent pairs of alpha carbon atoms superimposed-- percent identity

Page 16: The Protein Data Bank (PDB)

Many databases explore protein structures

Page 293

SCOP

CATH

Dali Domain Dictionary

FSSP

Page 17: The Protein Data Bank (PDB)

Structural Classification of Proteins (SCOP)

Page 293

SCOP describes protein structures using a hierarchical classification scheme:

ClassesFoldsSuperfamilies (likely evolutionary relationship)FamiliesDomainsIndividual PDB entries

http://scop.mrc-lmb.cam.ac.uk/scop/

Page 18: The Protein Data Bank (PDB)

Class, Architecture, Topology, andHomologous Superfamily (CATH) database

Page 293

CATH clusters proteins at four levels:

C Class (, , & folds)A Architecture (shape of domain, e.g. jelly roll)T Topology (fold families; not necessarily homologous)H Homologous superfamily

http://www.biochem.ucl.ac.uk/basm/cath_new

Page 19: The Protein Data Bank (PDB)

SCOP statistics (September, 2005)

Class # folds # superfamilies # familiesAll 218 376 608All 144 290 560/ 136 222 629+ 279 409 717…Total 945 1539 2845

Table 9-4Page 298

= parallel sheets= antiparallel sheets

Page 20: The Protein Data Bank (PDB)

Fig. 9.23Page 298

Page 21: The Protein Data Bank (PDB)

Fig. 9.24Page 299

Page 22: The Protein Data Bank (PDB)

Fig. 9.25Page 300

Page 23: The Protein Data Bank (PDB)

Fig. 9.25Page 300

Page 24: The Protein Data Bank (PDB)

Fig. 9.26Page 301

Page 25: The Protein Data Bank (PDB)

Fig. 9.27Page 302

Page 26: The Protein Data Bank (PDB)

Fig. 9.28Page 303

Page 27: The Protein Data Bank (PDB)

Dali Domain Dictionary

Page 302

Dali contains a numerical taxonomy of all knownstructures in PDB. Dali integrates additional data for entries within a domain class, such as secondary structure predictions and solvent accessibility.

Page 28: The Protein Data Bank (PDB)

Fig. 9.29Page 303

Page 29: The Protein Data Bank (PDB)

Fig. 9.30Page 304

Page 30: The Protein Data Bank (PDB)

Fig. 9.30Page 304

Page 31: The Protein Data Bank (PDB)

Fig. 9.30Page 304

Page 32: The Protein Data Bank (PDB)

Fold classification based on structure-structurealignment of proteins (FSSP)

Page 293

FSSP is based on a comprehensive comparison ofPDB proteins (greater than 30 amino acids in length).Representative sets exclude sequence homologssharing > 25% amino acid identity.

The output includes a “fold tree.”

http://www.ebi.ac.uk/dali/fssp

Page 33: The Protein Data Bank (PDB)

Fig. 9.31Page 305

Page 34: The Protein Data Bank (PDB)

FSSP: fold tree

Fig. 9.32Page 306

Page 35: The Protein Data Bank (PDB)

Fig. 9.33Page 307

Page 36: The Protein Data Bank (PDB)

Fig. 9.34Page 307

Page 37: The Protein Data Bank (PDB)

Page 303-305

There are about >20,000 structures in PDB, andabout 1 million protein sequences in SwissProt/TrEMBL. For most proteins, structural modelsderive from computational biology approaches,rather than experimental methods.

The most reliable method of modeling and evaluatingnew structures is by comparison to previouslyknown structures. This is comparative modeling.

An alternative is ab initio modeling.

Approaches to predicting protein structures

Page 38: The Protein Data Bank (PDB)

obtain sequence (target)

fold assignment

comparativemodeling

ab initiomodeling

build, assess model Fig. 9.35Page 308

Approaches to predicting protein structures

Page 39: The Protein Data Bank (PDB)

Page 305

[1] Perform fold assignment (e.g. BLAST, CATH, SCOP); identify structurally conserved regions

[2] Align the target (unknown protein) with the template. This is performed for >30% amino acid identity over a sufficient length

[3] Build a model

[4] Evaluate the model

Comparative modeling of protein structures

Page 40: The Protein Data Bank (PDB)

Page 306

Errors may occur for many reasons

[1] Errors in side-chain packing

[2] Distortions within correctly aligned regions

[3] Errors in regions of target that do not match template

[4] Errors in sequence alignment

[5] Use of incorrect templates

Errors in comparative modeling

Page 41: The Protein Data Bank (PDB)

Page 306

In general, accuracy of structure prediction dependson the percent amino acid identity shared betweentarget and template.

For >50% identity, RMSD is often only 1 Å.

Comparative modeling

Page 42: The Protein Data Bank (PDB)

Baker and Sali (2000)Fig. 9.36Page 308

Page 43: The Protein Data Bank (PDB)

Page 309

Many web servers offer comparative modeling services.

Examples areSWISS-MODEL (ExPASy)Predict Protein server (Columbia)WHAT IF (CMBI, Netherlands)

Comparative modeling