Structural database and their classification by abdul qahar

Amino acids

Presenting By Abdul Qahar (A Q)

Buner Campus

Edited, Prepared and shared ByAbdul Qahar

Structural database and their classification.

Basic concept about DatabaseBasic concept about Database

1. What is a database?1. What is a database?A database is a collection of data which can be A database is a collection of data which can be used: used: • alone, or alone, or • combined / related to other data combined / related to other data

to provide answers to the user’s question.to provide answers to the user’s question.

Data types

primary data

secondary data

tertiary data

sequence

DNA

amino acid

DMPVERILEALAVE…

primary database

secondary protein structure“motifs”: regular

expressions, blocks, profiles, fingerprints

e. g., alpha-helices, beta-strands

secondary db

domains, folding units

tertiary protein structure tertiary db

atomic co-ordinates

interaction data

binary protein-protein interactions/ networks

pathways and functional networks

interaction db

Primary biological databases

Nucleic acid databases

EMBL

GenBank

DDBJ (DNA Data Bank of Japan)

Protein databases

PIR

MIPS

SWISS-PROT

TrEMBL

NRL-3D

Nucleotide Databases

•EMBL:Nucleotide sequence database•Ensembl: Automatics annotation of eukaryotic genomes•Genome Server: Overview of completed genomes at EBI•Genome-MOT: Genome monitoring table•EMBL-Align: Multiple sequence alignment database

Sequence data = strings of letters

Nucleotides (bases)

Adenine (A)

Cytosine (C)

Guanine (G)

Thymine (T)

triplet codons

genetic code

20 amino acids (A, L, V, S etc.)

Three-dimensional protein structure = atomic coordinates in 3D space

Protein folding

EMBL/GenBank/DDJB

• These 3 db contain mainly the same information (few differences in the format and syntax)

• Serve as archives containing all sequences (single genes, ESTs, complete genomes, etc.) derived from:– Genome projects and sequencing centers– Individual scientists – Patent offices (i.e. USPTO, EPO)

• Non-confidential data are exchanged daily.

Databases related to Genomics

• Contain information on genes, gene location (mapping), gene nomenclature and links to sequence databases;

• Exist for most organisms important for life science research;• Examples: MIM, GDB (human), MGD (mouse), FlyBase

(Drosophila), SGD (yeast), MaizeDB (maize), SubtiList (B.subtilis), etc.

Swiss-Prot

• Annotated protein sequence database established in 1986 and maintained collaboratively since 1987, by the Department of Medical Biochemistry of the University of Geneva and EBI

• Complete, Curated, Non-redundant and cross-referenced with 34 other databases

• Highly cross-referenced• Available from a variety of servers and through sequence analysis

software tools• More than 8,000 different species• First 20 species represent about 42% of all sequences in the

database• More than 1,29,000 entries with 4.7 X 1010 amino acids

PDB: Protein Data Bank

• Holds 3D models of biological macromolecules (protein, RNA, DNA).

• All data are available to the public.

• Obtained by X-Ray crystallography (84%) or NMR spectroscopy (16%).

• Submitted by biologists and biochemists from around the world.

EMBL Nucleotide Sequence Database

• An annotated collection of all publicly available nucleotide and protein sequences

• Created in 1980 at the European Molecular Biology Laboratory in Heidelberg.

• Maintained since 1994 by EBI- Cambridge.

DDBJ–DNA Data Bank of Japan

• An annotated collection of all publicly available nucleotide and protein sequences

• Started, 1984 at the National Institute of Genetics (NIG) in Mishima.

• Still maintained in this institute a team led by Takashi Gojobori.

Why Proteins Structure ?Why Proteins Structure ?

Proteins are fundamental components of all living cells, performing a variety of biological tasks.

Each protein has a particular 3D structure that determines its function.

Protein structure is more conserved than protein sequence, and more closely related to function.

Supersecondary structures

Assembly of secondary structures which are shared by many structures.

Beta hairpin

Beta-alpha-beta unit

Helix hairpin

Structural Databases

SCOP: Structural Classification of Proteins

Current Release: 686 folds; 1073 Superfamilies; 1827 Familes representing 15,979 PDB entries

CATH: Classification, Architecture, Topology, Homology

Levels in SCOP

1. Class2. Folds3. Super families4. Families

Major classes in scop

• Classes– All alpha proteins– Alpha and beta proteins (a/b)– Alpha and beta proteins (a+b)– Multi-domain proteins– Membrane and cell surface proteins– Small proteins

Folds*

• Each Class may be divided into one or more folds• Proteins which have the same secondary structure elements

arranged the in the same order in the protein chain and in three dimensions are classified as having the same fold

Superfamilies

• Superfamilies are a subdivisions of folds• A superfamily contains proteins which are thought to be

evolutionarily related due to– Sequence– Function– Special structural features

• Relationships between members of a superfamily may not be readily recognizable from the sequence alone

Families

• Subdivision of super families• Contains members whose relationship is readily recognizable

from the sequence• Families are further subdivided in to Proteins• Proteins are divided into Species

– The same protein may be found in several species

All alpha: Hemoglobin

All beta: Immunoglobulin (8fab)

OL

OL

Alpha/beta: Triosephosphate isomerase

CATH

• Levels• Class• Architecture

– This level is unique to CATH • Topology

– ~Fold(/super family) in SCOP• Homologous Super family

– ~Super family(/family) in SCOP

Architecture

• Same overall arrangement of secondary structures – Example: The architecture :Two layer beta sheet proteins

contains different folds each with a distinct number and connectivity of strands

Abdul Qahar Buneri [email protected]

www.slideshare.net/abdulqahar045

Structural database and their classification by abdul qahar

Technology

proteins proteins

protein data bank

protein sequences

sequence databases

proteins structure

protein chain

folds proteins

sequence families