Top Banner
EMBL-EBI the European Macromolecular Structure Database (EMSD). http://www.ebi.ac.uk/msd/education/Tutorial.html http://www. ebi .ac. uk / msd / roadshow .html MSD Roadshow Co-ordinator . Janet Copeland 2 nd November 2005 Oxford University
32

EMBL-EBI the European Macromolecular Structure Database (EMSD). .

Jan 02, 2016

Download

Documents

Darlene Benson
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 2: EMBL-EBI the European Macromolecular Structure Database (EMSD).  .

EMBL-EBI

USERNAME: cal

PASSWORD: warthog

Page 3: EMBL-EBI the European Macromolecular Structure Database (EMSD).  .

EMBL-EBI

Introduction to MSD and to Quaternary Structures/Assemblies as Basis of MSD database

SSM Fold recognition

PISA Surface and assembly toolkit

MSDchem Chemistry reference data

MSDlite/MSDpro generalised search systems

MSDsite Active sites

MSDmotif small structural motifs

Page 4: EMBL-EBI the European Macromolecular Structure Database (EMSD).  .

EMBL-EBI

Visualisation and Patterns

Intergration Projects with Sequence and Domain data

Validation/Deposition

Clustering methods used at MSD

MSDmine – generalised data access to the MSD

PIMS – Protein Information System

Targets – Workflow for Target selection tools

NMR – NMR tools and data at MSD

Data Mining and an example MSDtemplates

DataBases at MSD including data warehouse technologies

DataBase Replication

Page 5: EMBL-EBI the European Macromolecular Structure Database (EMSD).  .

http://www.ebi.ac.uk/msd

Page 6: EMBL-EBI the European Macromolecular Structure Database (EMSD).  .

EMBL-EBI

Genomes

Hypotheses andin silico models

Bioinformatics

Expression-profiling

Comparativegenomics

Mutant/RNAidata

Metabolic data

Literature

Proteome data

Biochemistry

Bioinformatics

Page 7: EMBL-EBI the European Macromolecular Structure Database (EMSD).  .

EMBL-EBI

Role of Bioinformatics

To Support Experimental BiologyTo Collect and Archive DataTo provide Framework and IntegrationTo give Easy Access to Data

To make New Discoveries through Data Analysis

Page 8: EMBL-EBI the European Macromolecular Structure Database (EMSD).  .

EMBL-EBI

http://www.wwpdb.org/

Page 9: EMBL-EBI the European Macromolecular Structure Database (EMSD).  .

EMBL-EBI

WHAT IS THE PDB?

Page 10: EMBL-EBI the European Macromolecular Structure Database (EMSD).  .

EMBL-EBI

Databanks and Databases

The PDB Archive is a “databank” A series of flat files that have a format originally

designed for Fortran card readers

The MSD provides “databases” Collections of data (1000’s attributes)

organized into relational tables and held with a RDMS.

Page 11: EMBL-EBI the European Macromolecular Structure Database (EMSD).  .

PQS biological assemblies

MSDchem ligand data

Electron Density VisualisationAstexViewer MSDPro, MSDlite

SSM fold matching Surface MatchingMSDsite Active sites

Linking to Domain data, eFamily

Sequence Mapping, SIFTS

Page 12: EMBL-EBI the European Macromolecular Structure Database (EMSD).  .

EMBL-EBI

Data & information

ATOM 2567 N PHE B 175 7.821 -25.530 -22.848 1.00 8.71 ATOM 2568 CA PHE B 175 8.845 -25.172 -21.877 1.00 9.41ATOM 2569 C PHE B 175 9.449 -23.798 -22.169 1.00 10.02 ATOM 2570 O PHE B 175 10.664 -23.613 -22.103 1.00 10.37 ATOM 2571 CB PHE B 175 9.928 -26.251 -21.848 1.00 9.53 ATOM 2572 CG PHE B 175 10.969 -26.137 -22.982 1.00 10.03 ATOM 2573 CD1 PHE B 175 12.356 -25.819 -22.988 1.00 10.51 ATOM 2574 CD2 PHE B 175 11.725 -27.211 -23.402 1.00 10.25 ATOM 2575 CE1 PHE B 175 11.821 -27.095 -22.869 1.00 11.17 ATOM 2576 CE2 PHE B 175 12.282 -26.086 -24.008 1.00 10.95 ATOM 2577 CZ PHE B 175 10.953 -26.335 -23.622 1.00 11.38

Page 13: EMBL-EBI the European Macromolecular Structure Database (EMSD).  .

EMBL-EBI

MSD service provider

We provide a service to the scientific community 24/7 (almost) :

parallel DB with fail-over, etc.

Service “ping” baseline check several times/day Data is incremented with new data weekly Systems are extensible

Page 14: EMBL-EBI the European Macromolecular Structure Database (EMSD).  .

EMBL-EBI

Query capabilities

Browsing (click and read) Simple search

select records with some constraints More elaborate search

select specific fields of some records with constraints on some fields

Complex queryingability to return an answer that results from a

"live" computation, and was not part of any record of the database

Page 15: EMBL-EBI the European Macromolecular Structure Database (EMSD).  .

EMBL-EBI

What we cannot do well

“Give us sequence, we do rest”

Page 16: EMBL-EBI the European Macromolecular Structure Database (EMSD).  .

EMBL-EBI

Page 17: EMBL-EBI the European Macromolecular Structure Database (EMSD).  .

EMBL-EBI

Page 18: EMBL-EBI the European Macromolecular Structure Database (EMSD).  .

EMBL-EBI

What is the function of this structure?

What is the function of this sequence?

What is the function of this motif? the fold provides a scaffold, which

can be decorated in different ways by different sequences to confer different functions - knowing the fold & function allows us to rationalise how the structure effects its function at the molecular level

Page 19: EMBL-EBI the European Macromolecular Structure Database (EMSD).  .

EMBL-EBI

Complication – Multiprotein Complexes

Page 20: EMBL-EBI the European Macromolecular Structure Database (EMSD).  .

EMBL-EBI

1H8E (ADP.ALF4)2(ADP.SO4) BOVINE F1-ATPASE (ALL THREE CATALYTIC SITES OCCUPIED)MENZ, R.I., WALKER, J.E., LESLIE, A.G.W.

ATPase

Page 21: EMBL-EBI the European Macromolecular Structure Database (EMSD).  .

EMBL-EBI

Ground rules for bioinformatics

Don't always believe what programs tell youthey're often misleading & sometimes wrong!

Don't always believe what databases tell youthey're often misleading & sometimes wrong!

Don't always believe what lecturers tell youthey're often misleading & sometimes wrong!

In short, don't be a naive user when computers are applied to biology, it is vital to

understand the difference between mathematical & biological significance

computers don’t do biology - they do sums quickly!

Page 22: EMBL-EBI the European Macromolecular Structure Database (EMSD).  .

EMBL-EBI

General Evaluation Criteria Be sceptical and cynical!

When you are searching for information you need to judge its quality and suitability.

Think critically about each piece of information you find and how you found it.

Relevance: Does the information you have found adequately support your research? Does it answer the question, or support one of your arguments? How general or specific is the information about the topic?

Page 23: EMBL-EBI the European Macromolecular Structure Database (EMSD).  .

EMBL-EBI

http://harvester.embl.de/

“Harvester” collects information from selected public databases

Page 24: EMBL-EBI the European Macromolecular Structure Database (EMSD).  .

EMBL-EBI

Appreciate how difficult it is to draw a complex 3-D object and appreciate the complexity of the requirements for storing sequence and structural information of molecules in a database.

There are a lot of interrelated pieces of information about a biomolecule, such as

sequence similaritiesgenome locationprotein structureExpressionchemistry

Page 25: EMBL-EBI the European Macromolecular Structure Database (EMSD).  .

EMBL-EBI

Data formats are not standard. The nomenclature is not standard. There is more than one database offering the same information (data redundancy). Links between databases may not be easy to follow. The number of databases available makes it confusing to choose from

Some of the obstacles of searching databases are:

Page 26: EMBL-EBI the European Macromolecular Structure Database (EMSD).  .

EMBL-EBI

You need to determine whether the information is reliable or not

Accuracy or Validity

Page 27: EMBL-EBI the European Macromolecular Structure Database (EMSD).  .

EMBL-EBI

Quality Control Issues

The quality of archived data is no better than the data determined in the contributing laboratories.

Curation of the data can help to identify errors. Disagreement between duplicate determinations is a

clear warning of an error in one or the other. Similarly, results that disagree with established

principles may contain errors. It is useful, for instance, to flag deviations from

expected stereochemistry in protein structures, but such ``outliers'' are not necessarily wrong.

Page 28: EMBL-EBI the European Macromolecular Structure Database (EMSD).  .

EMBL-EBI

Data quality

Data Consistency Data Models Reliability

Evidences ? Level of confidence ?Assignation of function by similarity

recursive process propagation of errors

Page 29: EMBL-EBI the European Macromolecular Structure Database (EMSD).  .

EMBL-EBI

Data quality

It’s hard to judge whether something “makes sense”.

The lack of labeling on many web pages makes it hard to know the source.

Calculations based on databases are even harder to deal with

Logical deductions may be worse.

“tacR gene regulates the human nervous system”

“tacQ gene is similar to tacR but is found in E. coli”

“so tacQ gene regulates the E. coli nervous system”

Page 30: EMBL-EBI the European Macromolecular Structure Database (EMSD).  .

EMBL-EBI

E. coli nervous system

Who spotted ?

Page 31: EMBL-EBI the European Macromolecular Structure Database (EMSD).  .

EMBL-EBI Significance

Appreciating that mathematical & biological significance are different is crucial

Important in understanding the limitations of database search algorithms multiple sequence alignment algorithms pattern recognition techniques functional site & structure prediction tools

Contrary to popular opinion, there is currently still no biologically-reliable automatic multiple alignment

algorithm no infallible pattern-recognition technique no reliable gene, function or structure prediction algorithm

Page 32: EMBL-EBI the European Macromolecular Structure Database (EMSD).  .

EMBL-EBI

As a result, we will have to give up the ``safe'' idea of a stable databank composed of entries that are correct when they are first distributed in mature form and stay fixed thereafter.

Databanks are dynamic in information content and growing in size, and maturing in quality.

Maintaining local copies – largely “top up” this is not sufficient.

Proliferation of various copies in various states with out-of-date linkages

New Problems