Amino acids
Jun 14, 2015
Amino acids
Presenting By Abdul Qahar (A Q)
Buner Campus
Edited, Prepared and shared ByAbdul Qahar
Structural database and their classification.
Basic concept about DatabaseBasic concept about Database
1. What is a database?1. What is a database?A database is a collection of data which can be A database is a collection of data which can be used: used: • alone, or alone, or • combined / related to other data combined / related to other data
to provide answers to the user’s question.to provide answers to the user’s question.
Data types
primary data
secondary data
tertiary data
sequence
DNA
amino acid
DMPVERILEALAVE…
primary database
secondary protein structure“motifs”: regular
expressions, blocks, profiles, fingerprints
e. g., alpha-helices, beta-strands
secondary db
domains, folding units
tertiary protein structure tertiary db
atomic co-ordinates
interaction data
binary protein-protein interactions/ networks
pathways and functional networks
interaction db
Primary biological databases
Nucleic acid databases
EMBL
GenBank
DDBJ (DNA Data Bank of Japan)
Protein databases
PIR
MIPS
SWISS-PROT
TrEMBL
NRL-3D
Nucleotide Databases
•EMBL:Nucleotide sequence database•Ensembl: Automatics annotation of eukaryotic genomes•Genome Server: Overview of completed genomes at EBI•Genome-MOT: Genome monitoring table•EMBL-Align: Multiple sequence alignment database
Sequence data = strings of letters
Nucleotides (bases)
Adenine (A)
Cytosine (C)
Guanine (G)
Thymine (T)
triplet codons
genetic code
20 amino acids (A, L, V, S etc.)
Three-dimensional protein structure = atomic coordinates in 3D space
Protein folding
EMBL/GenBank/DDJB
• These 3 db contain mainly the same information (few differences in the format and syntax)
• Serve as archives containing all sequences (single genes, ESTs, complete genomes, etc.) derived from:– Genome projects and sequencing centers– Individual scientists – Patent offices (i.e. USPTO, EPO)
• Non-confidential data are exchanged daily.
Databases related to Genomics
• Contain information on genes, gene location (mapping), gene nomenclature and links to sequence databases;
• Exist for most organisms important for life science research;• Examples: MIM, GDB (human), MGD (mouse), FlyBase
(Drosophila), SGD (yeast), MaizeDB (maize), SubtiList (B.subtilis), etc.
Swiss-Prot
• Annotated protein sequence database established in 1986 and maintained collaboratively since 1987, by the Department of Medical Biochemistry of the University of Geneva and EBI
• Complete, Curated, Non-redundant and cross-referenced with 34 other databases
• Highly cross-referenced• Available from a variety of servers and through sequence analysis
software tools• More than 8,000 different species• First 20 species represent about 42% of all sequences in the
database• More than 1,29,000 entries with 4.7 X 1010 amino acids
PDB: Protein Data Bank
• Holds 3D models of biological macromolecules (protein, RNA, DNA).
• All data are available to the public.
• Obtained by X-Ray crystallography (84%) or NMR spectroscopy (16%).
• Submitted by biologists and biochemists from around the world.
EMBL Nucleotide Sequence Database
• An annotated collection of all publicly available nucleotide and protein sequences
• Created in 1980 at the European Molecular Biology Laboratory in Heidelberg.
• Maintained since 1994 by EBI- Cambridge.
DDBJ–DNA Data Bank of Japan
• An annotated collection of all publicly available nucleotide and protein sequences
• Started, 1984 at the National Institute of Genetics (NIG) in Mishima.
• Still maintained in this institute a team led by Takashi Gojobori.
Why Proteins Structure ?Why Proteins Structure ?
Proteins are fundamental components of all living cells, performing a variety of biological tasks.
Each protein has a particular 3D structure that determines its function.
Protein structure is more conserved than protein sequence, and more closely related to function.
Supersecondary structures
Assembly of secondary structures which are shared by many structures.
Beta hairpin
Beta-alpha-beta unit
Helix hairpin
Structural Databases
SCOP: Structural Classification of Proteins
Current Release: 686 folds; 1073 Superfamilies; 1827 Familes representing 15,979 PDB entries
CATH: Classification, Architecture, Topology, Homology
Levels in SCOP
1. Class2. Folds3. Super families4. Families
Major classes in scop
• Classes– All alpha proteins– Alpha and beta proteins (a/b)– Alpha and beta proteins (a+b)– Multi-domain proteins– Membrane and cell surface proteins– Small proteins
Folds*
• Each Class may be divided into one or more folds• Proteins which have the same secondary structure elements
arranged the in the same order in the protein chain and in three dimensions are classified as having the same fold
Superfamilies
• Superfamilies are a subdivisions of folds• A superfamily contains proteins which are thought to be
evolutionarily related due to– Sequence– Function– Special structural features
• Relationships between members of a superfamily may not be readily recognizable from the sequence alone
Families
• Subdivision of super families• Contains members whose relationship is readily recognizable
from the sequence• Families are further subdivided in to Proteins• Proteins are divided into Species
– The same protein may be found in several species
All alpha: Hemoglobin
All beta: Immunoglobulin (8fab)
OL
OL
Alpha/beta: Triosephosphate isomerase
CATH
• Levels• Class• Architecture
– This level is unique to CATH • Topology
– ~Fold(/super family) in SCOP• Homologous Super family
– ~Super family(/family) in SCOP
Architecture
• Same overall arrangement of secondary structures – Example: The architecture :Two layer beta sheet proteins
contains different folds each with a distinct number and connectivity of strands
Abdul Qahar Buneri [email protected]
www.slideshare.net/abdulqahar045