Top Banner
Archives and Information Retrieval Lecture by; Ms. AQSAD RASHDA BIOINFORMATICS
36

Archives and Information Retrieval Lecture by; Ms. AQSAD RASHDA BIOINFORMATICS.

Dec 28, 2015

Download

Documents

Beatrix Lee
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Archives and Information Retrieval Lecture by; Ms. AQSAD RASHDA BIOINFORMATICS.

Archives and Information Retrieval

Lecture by Ms AQSAD RASHDA

BIOINFORMATICS

Database indexing and specification of search terms

bull An index is a set of pointers to information in a database

bull Search terms

bull Entries discrete coherent parcels of information

bull The information retrieval software

bull Keywords

bull ANDrsquo NOT

bull Follow-up questions

UK MRC Human Genome Mapping Project Resource Centre

httpwwwhgmpmrcacuk

Primary data collections related to biological macromolecules include1048707 Nucleic acid sequences including whole-genome projects1048707 Amino acid sequences of proteins1048707 Protein and nucleic acid structures1048707 Small-molecule crystal structures1048707 Protein functions1048707 Expression patterns of genes1048707 Publications

The archives

Nucleic acid sequence databases

bull Triple partnership of

bull National Center for Biotechnology Information (USA)

bull EMBL Data Library (European Bioinformatics Institute UK)

bull DNA Data Bank of Japan (National Institute of Genetics Japan)

Entries have a life cycle

Nucleotide sequence databases

bull EMBL GenBank and DDBJ are the three primary nucleotide sequence databases

bull EMBL wwwebiacukembl

bull GenBank wwwncbinlmnihgovGenbank

bull DDBJ wwwddbjnigacjp

The EMBL data library entry for the bovine pancreatic trypsin inhibitor gene

Organism Classification

Keyword

Description

Date

Identification

Accession

Organism Source

Reference Number

Reference Title

Reference Author

Reference Position

Reference Location

Feature Table Header

FT The feature table may indicate regions that1 perform or affect function2 interact with other molecules3 affect replication4 are involved in recombination5 are a repeated unit6 have secondary or tertiary structure7 are revised or corrected

Sequence Header

Protein sequence databases

bull SWISS-PROT The Swiss Institute of Bioinformatics collaborates with the EMBL Data Library to provide an annotated database of amino acid sequences called SWISS-PROT

bull PIR Another protein sequence database is produced by The PIR International

httppirgeorgetownedu

PIR entry for the amino acid sequence of Bovine pancreatic trypsin inhibitor

Databases associated with SWISS-PROTbull ENZYME DB and PROSITEbull The ENZYME DB stores the following information about enzymesbull EC Number a numerical identifier assigned by the Enzyme

Commission (authorized by the International Union of Biochemistry and Molecular Biology see

bull httpwwwchemqmwacukiubmbenzyme)bull Recommended namebull Alternative names if anybull Catalytic activitybull Cofactors if anybull Pointers to SWISS-PROT and other data banksbull Pointers to disease associated with enzyme deficiency if any known

A Sample Entry in ENZYME DB

The PIR and associated databases

bull The PIR maintains several databases about proteins1 PIR-PSD the main protein sequence database2 iProClass classification of proteins according to structure and

function3 ASDB annotation and similarity database each entry is linked to a

list of similar sequences4 PR-NREF a comprehensive non-redundant collection of over 800

000 protein sequences merged from all available sources5 NRL3D a database of sequences and annotations of proteins of

known structure deposited in the Protein Data Bank6 ALN a database of protein sequence alignments and7RESID a database of covalent protein structure modifications (recall

that important structural features of proteins such as disulphide bridges are not inferrable from gene sequences and will not appear in protein sequence databases derived solely by translation of genomic data)

Databases of structuresbull Structure databases archive annotate and distribute sets of atomic

coordinatesbull Protein Data Bank (PDB)bull The information contained includes1 What protein is the subject of the entry and what species it came

from2 Who solved the structure and references to publications describing

the structure determination3 Experimental details about the structure determination including

information related to the general quality of the result such as resolution of an X-ray structure determination and stereochemical statistics

4 The amino acid sequence5 What additional molecules appear in the structure including

cofactors inhibitors and water molecules6 Assignments of secondary structure helix sheet7 Disulphide bridges8 The atomic coordinates

Protein data bank entry 2TRX E coli thioredoxin

Indicators of structure qualitybull X-ray crystal structure analysisbull Nuclear Magnetic Resonancebull Web Resource Protein and Nucleic Acid Structuresbull Home page of protein data bankbull httpwwwrcsborgbull Home page of EBI macromolecular structure databasebull httpmsdebiacukbull Home page of BioMagResBankbull httpwwwbmrbwiscedubull Searching the protein data bankbull Home page of SCOP (Structural classification of proteins)bull httpscopmrc-lmbcamacukscopbull List of browsersbull httppdb-browsersebiacukbrowse_itshtmlbull OCAbull httpocaebiacukoca-binocamain

Crystal

Hanging drop method vapour diffusion method

Microscope slide

2-Concentrated salt solution

1-Dilute protein solutionMicroscope

many differentconditions of 1amp2must be tried

many differentconditions of 1amp2must be tried

Crystallisation

Slide courtesy from Shoshana Wodak

Diffraction pattern Atomic model

Determination of protein structure

Slide courtesy from Shoshana Wodak

A high resolution protein structure 15 - 20 Aring resolution

The resolution problem

Slide courtesy from Shoshana Wodak

Nuclear Magnetic Resonance (NMR)

Source Branden amp Tooze (1991)

Classifications of protein structuresbull SCOP Structural Classification of Proteins

bull CATH ClassArchitectureTopologyHomology

bull DALIClassification of protein domains

Based on extraction of similar structures from distance matrices

[httpwwwebiacukdalidomain]

bull CE A database of structural alignments

Class

Architecture

Topology

Figure from Shoshana Wodak

CATH - A protein domain classification

bull In CATH protein domains are classified according to a tree with 4 levels of hierarchicallyndash Classndash Architecturendash Topologyndash Homology

Specialized or boutique databasesbull VIPER (Virus Particle ExploreR) treats crystal structures of

icosahedral viruses

bull In the field of immunologybull IMGT the international ImMunoGeneTics database is a

high-quality integrated database specializing in Immunoglobulins (Ig) T-cell receptors (TcR) and Major Histocompatibility Complex (MHC) molecules of all vertebrate species

bull The IMGT server provides a common access to all Immunogenetics data

bull At present it includes two databases IMGTLIGM-DB a comprehensive database of immunoglobulin and T-cell receptor gene sequences from human and other vertebrates with translation for fully annotated sequences and IMGTHLA-DB a database of the human MHC referred to as HLA (Human Leucocyte Antigens)

Web Resource Databases for Specific Protein Families

bull Protein kinasesbull httpwwwsdscedukinasesbull HIV proteasesbull httpwww-fbscncifcrfgovHIVdbbull Icosahedral virusesbull httpmmtsbscrippseduvipermainhtmlbull Immunologybull IGMT httpimgtcinesfrbull KABAT httpimmunobmenwuedubull MHCPEP httpwehihwehieduaumhcpepbull Collections of links to databases on specific protein familiesbull httpwww2ebiacukmsdLinksfamilyshtmlbull KABAT - Database of Sequences of Proteins of Immunological Interest -

North-Western University (USA)bull MHCPEP - Major Histocompatibility Complex Binding Peptides Database -

Walter and Eliza Hall Institute (Melbourne Australia)

Expression and proteomics databases

bull Expression databases record measurements of mRNA levels usually via ESTs (short terminal sequences of cDNA synthesized from mRNA)

Comparisons of expression patterns give clues to (1) the function and mechanism of action of gene products(2) how organisms coordinate their control over metabolic processes in

different conditions - for instance yeast under aerobic or anaerobic conditions

(3) the variations in mobilization of genes at different stages of the cell cycle or of the development of an organism (4) mechanisms of antibiotic resistance in bacteria and consequent

suggestion of targets for drug development (5) the response to challenge by a parasite (6) the response to medications of different types and dosages to

guide effective therapy

  • Archives and Information Retrieval
  • Database indexing and specification of search terms
  • Nucleic acid sequence databases
  • Nucleotide sequence databases
  • Slide 6
  • Slide 7
  • The EMBL data library entry for the bovine pancreatic trypsin inhibitor gene
  • Slide 9
  • FT The feature table may indicate regions that 1 perform or affect function 2 interact with other molecules 3 affect replication 4 are involved in recombination 5 are a repeated unit 6 have secondary or tertiary structure 7 are revised or corrected
  • Protein sequence databases
  • Slide 12
  • PIR entry for the amino acid sequence of Bovine pancreatic trypsin inhibitor
  • Slide 14
  • Slide 15
  • Databases associated with SWISS-PROT
  • A Sample Entry in ENZYME DB
  • The PIR and associated databases
  • Databases of structures
  • Slide 20
  • Protein data bank entry 2TRX E coli thioredoxin
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Indicators of structure quality
  • Crystallisation
  • Determination of protein structure
  • The resolution problem
  • Nuclear Magnetic Resonance (NMR)
  • Classifications of protein structures
  • CATH - A protein domain classification
  • Specialized or boutique databases
  • Web Resource Databases for Specific Protein Families
  • Expression and proteomics databases
Page 2: Archives and Information Retrieval Lecture by; Ms. AQSAD RASHDA BIOINFORMATICS.

Database indexing and specification of search terms

bull An index is a set of pointers to information in a database

bull Search terms

bull Entries discrete coherent parcels of information

bull The information retrieval software

bull Keywords

bull ANDrsquo NOT

bull Follow-up questions

UK MRC Human Genome Mapping Project Resource Centre

httpwwwhgmpmrcacuk

Primary data collections related to biological macromolecules include1048707 Nucleic acid sequences including whole-genome projects1048707 Amino acid sequences of proteins1048707 Protein and nucleic acid structures1048707 Small-molecule crystal structures1048707 Protein functions1048707 Expression patterns of genes1048707 Publications

The archives

Nucleic acid sequence databases

bull Triple partnership of

bull National Center for Biotechnology Information (USA)

bull EMBL Data Library (European Bioinformatics Institute UK)

bull DNA Data Bank of Japan (National Institute of Genetics Japan)

Entries have a life cycle

Nucleotide sequence databases

bull EMBL GenBank and DDBJ are the three primary nucleotide sequence databases

bull EMBL wwwebiacukembl

bull GenBank wwwncbinlmnihgovGenbank

bull DDBJ wwwddbjnigacjp

The EMBL data library entry for the bovine pancreatic trypsin inhibitor gene

Organism Classification

Keyword

Description

Date

Identification

Accession

Organism Source

Reference Number

Reference Title

Reference Author

Reference Position

Reference Location

Feature Table Header

FT The feature table may indicate regions that1 perform or affect function2 interact with other molecules3 affect replication4 are involved in recombination5 are a repeated unit6 have secondary or tertiary structure7 are revised or corrected

Sequence Header

Protein sequence databases

bull SWISS-PROT The Swiss Institute of Bioinformatics collaborates with the EMBL Data Library to provide an annotated database of amino acid sequences called SWISS-PROT

bull PIR Another protein sequence database is produced by The PIR International

httppirgeorgetownedu

PIR entry for the amino acid sequence of Bovine pancreatic trypsin inhibitor

Databases associated with SWISS-PROTbull ENZYME DB and PROSITEbull The ENZYME DB stores the following information about enzymesbull EC Number a numerical identifier assigned by the Enzyme

Commission (authorized by the International Union of Biochemistry and Molecular Biology see

bull httpwwwchemqmwacukiubmbenzyme)bull Recommended namebull Alternative names if anybull Catalytic activitybull Cofactors if anybull Pointers to SWISS-PROT and other data banksbull Pointers to disease associated with enzyme deficiency if any known

A Sample Entry in ENZYME DB

The PIR and associated databases

bull The PIR maintains several databases about proteins1 PIR-PSD the main protein sequence database2 iProClass classification of proteins according to structure and

function3 ASDB annotation and similarity database each entry is linked to a

list of similar sequences4 PR-NREF a comprehensive non-redundant collection of over 800

000 protein sequences merged from all available sources5 NRL3D a database of sequences and annotations of proteins of

known structure deposited in the Protein Data Bank6 ALN a database of protein sequence alignments and7RESID a database of covalent protein structure modifications (recall

that important structural features of proteins such as disulphide bridges are not inferrable from gene sequences and will not appear in protein sequence databases derived solely by translation of genomic data)

Databases of structuresbull Structure databases archive annotate and distribute sets of atomic

coordinatesbull Protein Data Bank (PDB)bull The information contained includes1 What protein is the subject of the entry and what species it came

from2 Who solved the structure and references to publications describing

the structure determination3 Experimental details about the structure determination including

information related to the general quality of the result such as resolution of an X-ray structure determination and stereochemical statistics

4 The amino acid sequence5 What additional molecules appear in the structure including

cofactors inhibitors and water molecules6 Assignments of secondary structure helix sheet7 Disulphide bridges8 The atomic coordinates

Protein data bank entry 2TRX E coli thioredoxin

Indicators of structure qualitybull X-ray crystal structure analysisbull Nuclear Magnetic Resonancebull Web Resource Protein and Nucleic Acid Structuresbull Home page of protein data bankbull httpwwwrcsborgbull Home page of EBI macromolecular structure databasebull httpmsdebiacukbull Home page of BioMagResBankbull httpwwwbmrbwiscedubull Searching the protein data bankbull Home page of SCOP (Structural classification of proteins)bull httpscopmrc-lmbcamacukscopbull List of browsersbull httppdb-browsersebiacukbrowse_itshtmlbull OCAbull httpocaebiacukoca-binocamain

Crystal

Hanging drop method vapour diffusion method

Microscope slide

2-Concentrated salt solution

1-Dilute protein solutionMicroscope

many differentconditions of 1amp2must be tried

many differentconditions of 1amp2must be tried

Crystallisation

Slide courtesy from Shoshana Wodak

Diffraction pattern Atomic model

Determination of protein structure

Slide courtesy from Shoshana Wodak

A high resolution protein structure 15 - 20 Aring resolution

The resolution problem

Slide courtesy from Shoshana Wodak

Nuclear Magnetic Resonance (NMR)

Source Branden amp Tooze (1991)

Classifications of protein structuresbull SCOP Structural Classification of Proteins

bull CATH ClassArchitectureTopologyHomology

bull DALIClassification of protein domains

Based on extraction of similar structures from distance matrices

[httpwwwebiacukdalidomain]

bull CE A database of structural alignments

Class

Architecture

Topology

Figure from Shoshana Wodak

CATH - A protein domain classification

bull In CATH protein domains are classified according to a tree with 4 levels of hierarchicallyndash Classndash Architecturendash Topologyndash Homology

Specialized or boutique databasesbull VIPER (Virus Particle ExploreR) treats crystal structures of

icosahedral viruses

bull In the field of immunologybull IMGT the international ImMunoGeneTics database is a

high-quality integrated database specializing in Immunoglobulins (Ig) T-cell receptors (TcR) and Major Histocompatibility Complex (MHC) molecules of all vertebrate species

bull The IMGT server provides a common access to all Immunogenetics data

bull At present it includes two databases IMGTLIGM-DB a comprehensive database of immunoglobulin and T-cell receptor gene sequences from human and other vertebrates with translation for fully annotated sequences and IMGTHLA-DB a database of the human MHC referred to as HLA (Human Leucocyte Antigens)

Web Resource Databases for Specific Protein Families

bull Protein kinasesbull httpwwwsdscedukinasesbull HIV proteasesbull httpwww-fbscncifcrfgovHIVdbbull Icosahedral virusesbull httpmmtsbscrippseduvipermainhtmlbull Immunologybull IGMT httpimgtcinesfrbull KABAT httpimmunobmenwuedubull MHCPEP httpwehihwehieduaumhcpepbull Collections of links to databases on specific protein familiesbull httpwww2ebiacukmsdLinksfamilyshtmlbull KABAT - Database of Sequences of Proteins of Immunological Interest -

North-Western University (USA)bull MHCPEP - Major Histocompatibility Complex Binding Peptides Database -

Walter and Eliza Hall Institute (Melbourne Australia)

Expression and proteomics databases

bull Expression databases record measurements of mRNA levels usually via ESTs (short terminal sequences of cDNA synthesized from mRNA)

Comparisons of expression patterns give clues to (1) the function and mechanism of action of gene products(2) how organisms coordinate their control over metabolic processes in

different conditions - for instance yeast under aerobic or anaerobic conditions

(3) the variations in mobilization of genes at different stages of the cell cycle or of the development of an organism (4) mechanisms of antibiotic resistance in bacteria and consequent

suggestion of targets for drug development (5) the response to challenge by a parasite (6) the response to medications of different types and dosages to

guide effective therapy

  • Archives and Information Retrieval
  • Database indexing and specification of search terms
  • Nucleic acid sequence databases
  • Nucleotide sequence databases
  • Slide 6
  • Slide 7
  • The EMBL data library entry for the bovine pancreatic trypsin inhibitor gene
  • Slide 9
  • FT The feature table may indicate regions that 1 perform or affect function 2 interact with other molecules 3 affect replication 4 are involved in recombination 5 are a repeated unit 6 have secondary or tertiary structure 7 are revised or corrected
  • Protein sequence databases
  • Slide 12
  • PIR entry for the amino acid sequence of Bovine pancreatic trypsin inhibitor
  • Slide 14
  • Slide 15
  • Databases associated with SWISS-PROT
  • A Sample Entry in ENZYME DB
  • The PIR and associated databases
  • Databases of structures
  • Slide 20
  • Protein data bank entry 2TRX E coli thioredoxin
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Indicators of structure quality
  • Crystallisation
  • Determination of protein structure
  • The resolution problem
  • Nuclear Magnetic Resonance (NMR)
  • Classifications of protein structures
  • CATH - A protein domain classification
  • Specialized or boutique databases
  • Web Resource Databases for Specific Protein Families
  • Expression and proteomics databases
Page 3: Archives and Information Retrieval Lecture by; Ms. AQSAD RASHDA BIOINFORMATICS.

UK MRC Human Genome Mapping Project Resource Centre

httpwwwhgmpmrcacuk

Primary data collections related to biological macromolecules include1048707 Nucleic acid sequences including whole-genome projects1048707 Amino acid sequences of proteins1048707 Protein and nucleic acid structures1048707 Small-molecule crystal structures1048707 Protein functions1048707 Expression patterns of genes1048707 Publications

The archives

Nucleic acid sequence databases

bull Triple partnership of

bull National Center for Biotechnology Information (USA)

bull EMBL Data Library (European Bioinformatics Institute UK)

bull DNA Data Bank of Japan (National Institute of Genetics Japan)

Entries have a life cycle

Nucleotide sequence databases

bull EMBL GenBank and DDBJ are the three primary nucleotide sequence databases

bull EMBL wwwebiacukembl

bull GenBank wwwncbinlmnihgovGenbank

bull DDBJ wwwddbjnigacjp

The EMBL data library entry for the bovine pancreatic trypsin inhibitor gene

Organism Classification

Keyword

Description

Date

Identification

Accession

Organism Source

Reference Number

Reference Title

Reference Author

Reference Position

Reference Location

Feature Table Header

FT The feature table may indicate regions that1 perform or affect function2 interact with other molecules3 affect replication4 are involved in recombination5 are a repeated unit6 have secondary or tertiary structure7 are revised or corrected

Sequence Header

Protein sequence databases

bull SWISS-PROT The Swiss Institute of Bioinformatics collaborates with the EMBL Data Library to provide an annotated database of amino acid sequences called SWISS-PROT

bull PIR Another protein sequence database is produced by The PIR International

httppirgeorgetownedu

PIR entry for the amino acid sequence of Bovine pancreatic trypsin inhibitor

Databases associated with SWISS-PROTbull ENZYME DB and PROSITEbull The ENZYME DB stores the following information about enzymesbull EC Number a numerical identifier assigned by the Enzyme

Commission (authorized by the International Union of Biochemistry and Molecular Biology see

bull httpwwwchemqmwacukiubmbenzyme)bull Recommended namebull Alternative names if anybull Catalytic activitybull Cofactors if anybull Pointers to SWISS-PROT and other data banksbull Pointers to disease associated with enzyme deficiency if any known

A Sample Entry in ENZYME DB

The PIR and associated databases

bull The PIR maintains several databases about proteins1 PIR-PSD the main protein sequence database2 iProClass classification of proteins according to structure and

function3 ASDB annotation and similarity database each entry is linked to a

list of similar sequences4 PR-NREF a comprehensive non-redundant collection of over 800

000 protein sequences merged from all available sources5 NRL3D a database of sequences and annotations of proteins of

known structure deposited in the Protein Data Bank6 ALN a database of protein sequence alignments and7RESID a database of covalent protein structure modifications (recall

that important structural features of proteins such as disulphide bridges are not inferrable from gene sequences and will not appear in protein sequence databases derived solely by translation of genomic data)

Databases of structuresbull Structure databases archive annotate and distribute sets of atomic

coordinatesbull Protein Data Bank (PDB)bull The information contained includes1 What protein is the subject of the entry and what species it came

from2 Who solved the structure and references to publications describing

the structure determination3 Experimental details about the structure determination including

information related to the general quality of the result such as resolution of an X-ray structure determination and stereochemical statistics

4 The amino acid sequence5 What additional molecules appear in the structure including

cofactors inhibitors and water molecules6 Assignments of secondary structure helix sheet7 Disulphide bridges8 The atomic coordinates

Protein data bank entry 2TRX E coli thioredoxin

Indicators of structure qualitybull X-ray crystal structure analysisbull Nuclear Magnetic Resonancebull Web Resource Protein and Nucleic Acid Structuresbull Home page of protein data bankbull httpwwwrcsborgbull Home page of EBI macromolecular structure databasebull httpmsdebiacukbull Home page of BioMagResBankbull httpwwwbmrbwiscedubull Searching the protein data bankbull Home page of SCOP (Structural classification of proteins)bull httpscopmrc-lmbcamacukscopbull List of browsersbull httppdb-browsersebiacukbrowse_itshtmlbull OCAbull httpocaebiacukoca-binocamain

Crystal

Hanging drop method vapour diffusion method

Microscope slide

2-Concentrated salt solution

1-Dilute protein solutionMicroscope

many differentconditions of 1amp2must be tried

many differentconditions of 1amp2must be tried

Crystallisation

Slide courtesy from Shoshana Wodak

Diffraction pattern Atomic model

Determination of protein structure

Slide courtesy from Shoshana Wodak

A high resolution protein structure 15 - 20 Aring resolution

The resolution problem

Slide courtesy from Shoshana Wodak

Nuclear Magnetic Resonance (NMR)

Source Branden amp Tooze (1991)

Classifications of protein structuresbull SCOP Structural Classification of Proteins

bull CATH ClassArchitectureTopologyHomology

bull DALIClassification of protein domains

Based on extraction of similar structures from distance matrices

[httpwwwebiacukdalidomain]

bull CE A database of structural alignments

Class

Architecture

Topology

Figure from Shoshana Wodak

CATH - A protein domain classification

bull In CATH protein domains are classified according to a tree with 4 levels of hierarchicallyndash Classndash Architecturendash Topologyndash Homology

Specialized or boutique databasesbull VIPER (Virus Particle ExploreR) treats crystal structures of

icosahedral viruses

bull In the field of immunologybull IMGT the international ImMunoGeneTics database is a

high-quality integrated database specializing in Immunoglobulins (Ig) T-cell receptors (TcR) and Major Histocompatibility Complex (MHC) molecules of all vertebrate species

bull The IMGT server provides a common access to all Immunogenetics data

bull At present it includes two databases IMGTLIGM-DB a comprehensive database of immunoglobulin and T-cell receptor gene sequences from human and other vertebrates with translation for fully annotated sequences and IMGTHLA-DB a database of the human MHC referred to as HLA (Human Leucocyte Antigens)

Web Resource Databases for Specific Protein Families

bull Protein kinasesbull httpwwwsdscedukinasesbull HIV proteasesbull httpwww-fbscncifcrfgovHIVdbbull Icosahedral virusesbull httpmmtsbscrippseduvipermainhtmlbull Immunologybull IGMT httpimgtcinesfrbull KABAT httpimmunobmenwuedubull MHCPEP httpwehihwehieduaumhcpepbull Collections of links to databases on specific protein familiesbull httpwww2ebiacukmsdLinksfamilyshtmlbull KABAT - Database of Sequences of Proteins of Immunological Interest -

North-Western University (USA)bull MHCPEP - Major Histocompatibility Complex Binding Peptides Database -

Walter and Eliza Hall Institute (Melbourne Australia)

Expression and proteomics databases

bull Expression databases record measurements of mRNA levels usually via ESTs (short terminal sequences of cDNA synthesized from mRNA)

Comparisons of expression patterns give clues to (1) the function and mechanism of action of gene products(2) how organisms coordinate their control over metabolic processes in

different conditions - for instance yeast under aerobic or anaerobic conditions

(3) the variations in mobilization of genes at different stages of the cell cycle or of the development of an organism (4) mechanisms of antibiotic resistance in bacteria and consequent

suggestion of targets for drug development (5) the response to challenge by a parasite (6) the response to medications of different types and dosages to

guide effective therapy

  • Archives and Information Retrieval
  • Database indexing and specification of search terms
  • Nucleic acid sequence databases
  • Nucleotide sequence databases
  • Slide 6
  • Slide 7
  • The EMBL data library entry for the bovine pancreatic trypsin inhibitor gene
  • Slide 9
  • FT The feature table may indicate regions that 1 perform or affect function 2 interact with other molecules 3 affect replication 4 are involved in recombination 5 are a repeated unit 6 have secondary or tertiary structure 7 are revised or corrected
  • Protein sequence databases
  • Slide 12
  • PIR entry for the amino acid sequence of Bovine pancreatic trypsin inhibitor
  • Slide 14
  • Slide 15
  • Databases associated with SWISS-PROT
  • A Sample Entry in ENZYME DB
  • The PIR and associated databases
  • Databases of structures
  • Slide 20
  • Protein data bank entry 2TRX E coli thioredoxin
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Indicators of structure quality
  • Crystallisation
  • Determination of protein structure
  • The resolution problem
  • Nuclear Magnetic Resonance (NMR)
  • Classifications of protein structures
  • CATH - A protein domain classification
  • Specialized or boutique databases
  • Web Resource Databases for Specific Protein Families
  • Expression and proteomics databases
Page 4: Archives and Information Retrieval Lecture by; Ms. AQSAD RASHDA BIOINFORMATICS.

Nucleic acid sequence databases

bull Triple partnership of

bull National Center for Biotechnology Information (USA)

bull EMBL Data Library (European Bioinformatics Institute UK)

bull DNA Data Bank of Japan (National Institute of Genetics Japan)

Entries have a life cycle

Nucleotide sequence databases

bull EMBL GenBank and DDBJ are the three primary nucleotide sequence databases

bull EMBL wwwebiacukembl

bull GenBank wwwncbinlmnihgovGenbank

bull DDBJ wwwddbjnigacjp

The EMBL data library entry for the bovine pancreatic trypsin inhibitor gene

Organism Classification

Keyword

Description

Date

Identification

Accession

Organism Source

Reference Number

Reference Title

Reference Author

Reference Position

Reference Location

Feature Table Header

FT The feature table may indicate regions that1 perform or affect function2 interact with other molecules3 affect replication4 are involved in recombination5 are a repeated unit6 have secondary or tertiary structure7 are revised or corrected

Sequence Header

Protein sequence databases

bull SWISS-PROT The Swiss Institute of Bioinformatics collaborates with the EMBL Data Library to provide an annotated database of amino acid sequences called SWISS-PROT

bull PIR Another protein sequence database is produced by The PIR International

httppirgeorgetownedu

PIR entry for the amino acid sequence of Bovine pancreatic trypsin inhibitor

Databases associated with SWISS-PROTbull ENZYME DB and PROSITEbull The ENZYME DB stores the following information about enzymesbull EC Number a numerical identifier assigned by the Enzyme

Commission (authorized by the International Union of Biochemistry and Molecular Biology see

bull httpwwwchemqmwacukiubmbenzyme)bull Recommended namebull Alternative names if anybull Catalytic activitybull Cofactors if anybull Pointers to SWISS-PROT and other data banksbull Pointers to disease associated with enzyme deficiency if any known

A Sample Entry in ENZYME DB

The PIR and associated databases

bull The PIR maintains several databases about proteins1 PIR-PSD the main protein sequence database2 iProClass classification of proteins according to structure and

function3 ASDB annotation and similarity database each entry is linked to a

list of similar sequences4 PR-NREF a comprehensive non-redundant collection of over 800

000 protein sequences merged from all available sources5 NRL3D a database of sequences and annotations of proteins of

known structure deposited in the Protein Data Bank6 ALN a database of protein sequence alignments and7RESID a database of covalent protein structure modifications (recall

that important structural features of proteins such as disulphide bridges are not inferrable from gene sequences and will not appear in protein sequence databases derived solely by translation of genomic data)

Databases of structuresbull Structure databases archive annotate and distribute sets of atomic

coordinatesbull Protein Data Bank (PDB)bull The information contained includes1 What protein is the subject of the entry and what species it came

from2 Who solved the structure and references to publications describing

the structure determination3 Experimental details about the structure determination including

information related to the general quality of the result such as resolution of an X-ray structure determination and stereochemical statistics

4 The amino acid sequence5 What additional molecules appear in the structure including

cofactors inhibitors and water molecules6 Assignments of secondary structure helix sheet7 Disulphide bridges8 The atomic coordinates

Protein data bank entry 2TRX E coli thioredoxin

Indicators of structure qualitybull X-ray crystal structure analysisbull Nuclear Magnetic Resonancebull Web Resource Protein and Nucleic Acid Structuresbull Home page of protein data bankbull httpwwwrcsborgbull Home page of EBI macromolecular structure databasebull httpmsdebiacukbull Home page of BioMagResBankbull httpwwwbmrbwiscedubull Searching the protein data bankbull Home page of SCOP (Structural classification of proteins)bull httpscopmrc-lmbcamacukscopbull List of browsersbull httppdb-browsersebiacukbrowse_itshtmlbull OCAbull httpocaebiacukoca-binocamain

Crystal

Hanging drop method vapour diffusion method

Microscope slide

2-Concentrated salt solution

1-Dilute protein solutionMicroscope

many differentconditions of 1amp2must be tried

many differentconditions of 1amp2must be tried

Crystallisation

Slide courtesy from Shoshana Wodak

Diffraction pattern Atomic model

Determination of protein structure

Slide courtesy from Shoshana Wodak

A high resolution protein structure 15 - 20 Aring resolution

The resolution problem

Slide courtesy from Shoshana Wodak

Nuclear Magnetic Resonance (NMR)

Source Branden amp Tooze (1991)

Classifications of protein structuresbull SCOP Structural Classification of Proteins

bull CATH ClassArchitectureTopologyHomology

bull DALIClassification of protein domains

Based on extraction of similar structures from distance matrices

[httpwwwebiacukdalidomain]

bull CE A database of structural alignments

Class

Architecture

Topology

Figure from Shoshana Wodak

CATH - A protein domain classification

bull In CATH protein domains are classified according to a tree with 4 levels of hierarchicallyndash Classndash Architecturendash Topologyndash Homology

Specialized or boutique databasesbull VIPER (Virus Particle ExploreR) treats crystal structures of

icosahedral viruses

bull In the field of immunologybull IMGT the international ImMunoGeneTics database is a

high-quality integrated database specializing in Immunoglobulins (Ig) T-cell receptors (TcR) and Major Histocompatibility Complex (MHC) molecules of all vertebrate species

bull The IMGT server provides a common access to all Immunogenetics data

bull At present it includes two databases IMGTLIGM-DB a comprehensive database of immunoglobulin and T-cell receptor gene sequences from human and other vertebrates with translation for fully annotated sequences and IMGTHLA-DB a database of the human MHC referred to as HLA (Human Leucocyte Antigens)

Web Resource Databases for Specific Protein Families

bull Protein kinasesbull httpwwwsdscedukinasesbull HIV proteasesbull httpwww-fbscncifcrfgovHIVdbbull Icosahedral virusesbull httpmmtsbscrippseduvipermainhtmlbull Immunologybull IGMT httpimgtcinesfrbull KABAT httpimmunobmenwuedubull MHCPEP httpwehihwehieduaumhcpepbull Collections of links to databases on specific protein familiesbull httpwww2ebiacukmsdLinksfamilyshtmlbull KABAT - Database of Sequences of Proteins of Immunological Interest -

North-Western University (USA)bull MHCPEP - Major Histocompatibility Complex Binding Peptides Database -

Walter and Eliza Hall Institute (Melbourne Australia)

Expression and proteomics databases

bull Expression databases record measurements of mRNA levels usually via ESTs (short terminal sequences of cDNA synthesized from mRNA)

Comparisons of expression patterns give clues to (1) the function and mechanism of action of gene products(2) how organisms coordinate their control over metabolic processes in

different conditions - for instance yeast under aerobic or anaerobic conditions

(3) the variations in mobilization of genes at different stages of the cell cycle or of the development of an organism (4) mechanisms of antibiotic resistance in bacteria and consequent

suggestion of targets for drug development (5) the response to challenge by a parasite (6) the response to medications of different types and dosages to

guide effective therapy

  • Archives and Information Retrieval
  • Database indexing and specification of search terms
  • Nucleic acid sequence databases
  • Nucleotide sequence databases
  • Slide 6
  • Slide 7
  • The EMBL data library entry for the bovine pancreatic trypsin inhibitor gene
  • Slide 9
  • FT The feature table may indicate regions that 1 perform or affect function 2 interact with other molecules 3 affect replication 4 are involved in recombination 5 are a repeated unit 6 have secondary or tertiary structure 7 are revised or corrected
  • Protein sequence databases
  • Slide 12
  • PIR entry for the amino acid sequence of Bovine pancreatic trypsin inhibitor
  • Slide 14
  • Slide 15
  • Databases associated with SWISS-PROT
  • A Sample Entry in ENZYME DB
  • The PIR and associated databases
  • Databases of structures
  • Slide 20
  • Protein data bank entry 2TRX E coli thioredoxin
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Indicators of structure quality
  • Crystallisation
  • Determination of protein structure
  • The resolution problem
  • Nuclear Magnetic Resonance (NMR)
  • Classifications of protein structures
  • CATH - A protein domain classification
  • Specialized or boutique databases
  • Web Resource Databases for Specific Protein Families
  • Expression and proteomics databases
Page 5: Archives and Information Retrieval Lecture by; Ms. AQSAD RASHDA BIOINFORMATICS.

Nucleotide sequence databases

bull EMBL GenBank and DDBJ are the three primary nucleotide sequence databases

bull EMBL wwwebiacukembl

bull GenBank wwwncbinlmnihgovGenbank

bull DDBJ wwwddbjnigacjp

The EMBL data library entry for the bovine pancreatic trypsin inhibitor gene

Organism Classification

Keyword

Description

Date

Identification

Accession

Organism Source

Reference Number

Reference Title

Reference Author

Reference Position

Reference Location

Feature Table Header

FT The feature table may indicate regions that1 perform or affect function2 interact with other molecules3 affect replication4 are involved in recombination5 are a repeated unit6 have secondary or tertiary structure7 are revised or corrected

Sequence Header

Protein sequence databases

bull SWISS-PROT The Swiss Institute of Bioinformatics collaborates with the EMBL Data Library to provide an annotated database of amino acid sequences called SWISS-PROT

bull PIR Another protein sequence database is produced by The PIR International

httppirgeorgetownedu

PIR entry for the amino acid sequence of Bovine pancreatic trypsin inhibitor

Databases associated with SWISS-PROTbull ENZYME DB and PROSITEbull The ENZYME DB stores the following information about enzymesbull EC Number a numerical identifier assigned by the Enzyme

Commission (authorized by the International Union of Biochemistry and Molecular Biology see

bull httpwwwchemqmwacukiubmbenzyme)bull Recommended namebull Alternative names if anybull Catalytic activitybull Cofactors if anybull Pointers to SWISS-PROT and other data banksbull Pointers to disease associated with enzyme deficiency if any known

A Sample Entry in ENZYME DB

The PIR and associated databases

bull The PIR maintains several databases about proteins1 PIR-PSD the main protein sequence database2 iProClass classification of proteins according to structure and

function3 ASDB annotation and similarity database each entry is linked to a

list of similar sequences4 PR-NREF a comprehensive non-redundant collection of over 800

000 protein sequences merged from all available sources5 NRL3D a database of sequences and annotations of proteins of

known structure deposited in the Protein Data Bank6 ALN a database of protein sequence alignments and7RESID a database of covalent protein structure modifications (recall

that important structural features of proteins such as disulphide bridges are not inferrable from gene sequences and will not appear in protein sequence databases derived solely by translation of genomic data)

Databases of structuresbull Structure databases archive annotate and distribute sets of atomic

coordinatesbull Protein Data Bank (PDB)bull The information contained includes1 What protein is the subject of the entry and what species it came

from2 Who solved the structure and references to publications describing

the structure determination3 Experimental details about the structure determination including

information related to the general quality of the result such as resolution of an X-ray structure determination and stereochemical statistics

4 The amino acid sequence5 What additional molecules appear in the structure including

cofactors inhibitors and water molecules6 Assignments of secondary structure helix sheet7 Disulphide bridges8 The atomic coordinates

Protein data bank entry 2TRX E coli thioredoxin

Indicators of structure qualitybull X-ray crystal structure analysisbull Nuclear Magnetic Resonancebull Web Resource Protein and Nucleic Acid Structuresbull Home page of protein data bankbull httpwwwrcsborgbull Home page of EBI macromolecular structure databasebull httpmsdebiacukbull Home page of BioMagResBankbull httpwwwbmrbwiscedubull Searching the protein data bankbull Home page of SCOP (Structural classification of proteins)bull httpscopmrc-lmbcamacukscopbull List of browsersbull httppdb-browsersebiacukbrowse_itshtmlbull OCAbull httpocaebiacukoca-binocamain

Crystal

Hanging drop method vapour diffusion method

Microscope slide

2-Concentrated salt solution

1-Dilute protein solutionMicroscope

many differentconditions of 1amp2must be tried

many differentconditions of 1amp2must be tried

Crystallisation

Slide courtesy from Shoshana Wodak

Diffraction pattern Atomic model

Determination of protein structure

Slide courtesy from Shoshana Wodak

A high resolution protein structure 15 - 20 Aring resolution

The resolution problem

Slide courtesy from Shoshana Wodak

Nuclear Magnetic Resonance (NMR)

Source Branden amp Tooze (1991)

Classifications of protein structuresbull SCOP Structural Classification of Proteins

bull CATH ClassArchitectureTopologyHomology

bull DALIClassification of protein domains

Based on extraction of similar structures from distance matrices

[httpwwwebiacukdalidomain]

bull CE A database of structural alignments

Class

Architecture

Topology

Figure from Shoshana Wodak

CATH - A protein domain classification

bull In CATH protein domains are classified according to a tree with 4 levels of hierarchicallyndash Classndash Architecturendash Topologyndash Homology

Specialized or boutique databasesbull VIPER (Virus Particle ExploreR) treats crystal structures of

icosahedral viruses

bull In the field of immunologybull IMGT the international ImMunoGeneTics database is a

high-quality integrated database specializing in Immunoglobulins (Ig) T-cell receptors (TcR) and Major Histocompatibility Complex (MHC) molecules of all vertebrate species

bull The IMGT server provides a common access to all Immunogenetics data

bull At present it includes two databases IMGTLIGM-DB a comprehensive database of immunoglobulin and T-cell receptor gene sequences from human and other vertebrates with translation for fully annotated sequences and IMGTHLA-DB a database of the human MHC referred to as HLA (Human Leucocyte Antigens)

Web Resource Databases for Specific Protein Families

bull Protein kinasesbull httpwwwsdscedukinasesbull HIV proteasesbull httpwww-fbscncifcrfgovHIVdbbull Icosahedral virusesbull httpmmtsbscrippseduvipermainhtmlbull Immunologybull IGMT httpimgtcinesfrbull KABAT httpimmunobmenwuedubull MHCPEP httpwehihwehieduaumhcpepbull Collections of links to databases on specific protein familiesbull httpwww2ebiacukmsdLinksfamilyshtmlbull KABAT - Database of Sequences of Proteins of Immunological Interest -

North-Western University (USA)bull MHCPEP - Major Histocompatibility Complex Binding Peptides Database -

Walter and Eliza Hall Institute (Melbourne Australia)

Expression and proteomics databases

bull Expression databases record measurements of mRNA levels usually via ESTs (short terminal sequences of cDNA synthesized from mRNA)

Comparisons of expression patterns give clues to (1) the function and mechanism of action of gene products(2) how organisms coordinate their control over metabolic processes in

different conditions - for instance yeast under aerobic or anaerobic conditions

(3) the variations in mobilization of genes at different stages of the cell cycle or of the development of an organism (4) mechanisms of antibiotic resistance in bacteria and consequent

suggestion of targets for drug development (5) the response to challenge by a parasite (6) the response to medications of different types and dosages to

guide effective therapy

  • Archives and Information Retrieval
  • Database indexing and specification of search terms
  • Nucleic acid sequence databases
  • Nucleotide sequence databases
  • Slide 6
  • Slide 7
  • The EMBL data library entry for the bovine pancreatic trypsin inhibitor gene
  • Slide 9
  • FT The feature table may indicate regions that 1 perform or affect function 2 interact with other molecules 3 affect replication 4 are involved in recombination 5 are a repeated unit 6 have secondary or tertiary structure 7 are revised or corrected
  • Protein sequence databases
  • Slide 12
  • PIR entry for the amino acid sequence of Bovine pancreatic trypsin inhibitor
  • Slide 14
  • Slide 15
  • Databases associated with SWISS-PROT
  • A Sample Entry in ENZYME DB
  • The PIR and associated databases
  • Databases of structures
  • Slide 20
  • Protein data bank entry 2TRX E coli thioredoxin
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Indicators of structure quality
  • Crystallisation
  • Determination of protein structure
  • The resolution problem
  • Nuclear Magnetic Resonance (NMR)
  • Classifications of protein structures
  • CATH - A protein domain classification
  • Specialized or boutique databases
  • Web Resource Databases for Specific Protein Families
  • Expression and proteomics databases
Page 6: Archives and Information Retrieval Lecture by; Ms. AQSAD RASHDA BIOINFORMATICS.

The EMBL data library entry for the bovine pancreatic trypsin inhibitor gene

Organism Classification

Keyword

Description

Date

Identification

Accession

Organism Source

Reference Number

Reference Title

Reference Author

Reference Position

Reference Location

Feature Table Header

FT The feature table may indicate regions that1 perform or affect function2 interact with other molecules3 affect replication4 are involved in recombination5 are a repeated unit6 have secondary or tertiary structure7 are revised or corrected

Sequence Header

Protein sequence databases

bull SWISS-PROT The Swiss Institute of Bioinformatics collaborates with the EMBL Data Library to provide an annotated database of amino acid sequences called SWISS-PROT

bull PIR Another protein sequence database is produced by The PIR International

httppirgeorgetownedu

PIR entry for the amino acid sequence of Bovine pancreatic trypsin inhibitor

Databases associated with SWISS-PROTbull ENZYME DB and PROSITEbull The ENZYME DB stores the following information about enzymesbull EC Number a numerical identifier assigned by the Enzyme

Commission (authorized by the International Union of Biochemistry and Molecular Biology see

bull httpwwwchemqmwacukiubmbenzyme)bull Recommended namebull Alternative names if anybull Catalytic activitybull Cofactors if anybull Pointers to SWISS-PROT and other data banksbull Pointers to disease associated with enzyme deficiency if any known

A Sample Entry in ENZYME DB

The PIR and associated databases

bull The PIR maintains several databases about proteins1 PIR-PSD the main protein sequence database2 iProClass classification of proteins according to structure and

function3 ASDB annotation and similarity database each entry is linked to a

list of similar sequences4 PR-NREF a comprehensive non-redundant collection of over 800

000 protein sequences merged from all available sources5 NRL3D a database of sequences and annotations of proteins of

known structure deposited in the Protein Data Bank6 ALN a database of protein sequence alignments and7RESID a database of covalent protein structure modifications (recall

that important structural features of proteins such as disulphide bridges are not inferrable from gene sequences and will not appear in protein sequence databases derived solely by translation of genomic data)

Databases of structuresbull Structure databases archive annotate and distribute sets of atomic

coordinatesbull Protein Data Bank (PDB)bull The information contained includes1 What protein is the subject of the entry and what species it came

from2 Who solved the structure and references to publications describing

the structure determination3 Experimental details about the structure determination including

information related to the general quality of the result such as resolution of an X-ray structure determination and stereochemical statistics

4 The amino acid sequence5 What additional molecules appear in the structure including

cofactors inhibitors and water molecules6 Assignments of secondary structure helix sheet7 Disulphide bridges8 The atomic coordinates

Protein data bank entry 2TRX E coli thioredoxin

Indicators of structure qualitybull X-ray crystal structure analysisbull Nuclear Magnetic Resonancebull Web Resource Protein and Nucleic Acid Structuresbull Home page of protein data bankbull httpwwwrcsborgbull Home page of EBI macromolecular structure databasebull httpmsdebiacukbull Home page of BioMagResBankbull httpwwwbmrbwiscedubull Searching the protein data bankbull Home page of SCOP (Structural classification of proteins)bull httpscopmrc-lmbcamacukscopbull List of browsersbull httppdb-browsersebiacukbrowse_itshtmlbull OCAbull httpocaebiacukoca-binocamain

Crystal

Hanging drop method vapour diffusion method

Microscope slide

2-Concentrated salt solution

1-Dilute protein solutionMicroscope

many differentconditions of 1amp2must be tried

many differentconditions of 1amp2must be tried

Crystallisation

Slide courtesy from Shoshana Wodak

Diffraction pattern Atomic model

Determination of protein structure

Slide courtesy from Shoshana Wodak

A high resolution protein structure 15 - 20 Aring resolution

The resolution problem

Slide courtesy from Shoshana Wodak

Nuclear Magnetic Resonance (NMR)

Source Branden amp Tooze (1991)

Classifications of protein structuresbull SCOP Structural Classification of Proteins

bull CATH ClassArchitectureTopologyHomology

bull DALIClassification of protein domains

Based on extraction of similar structures from distance matrices

[httpwwwebiacukdalidomain]

bull CE A database of structural alignments

Class

Architecture

Topology

Figure from Shoshana Wodak

CATH - A protein domain classification

bull In CATH protein domains are classified according to a tree with 4 levels of hierarchicallyndash Classndash Architecturendash Topologyndash Homology

Specialized or boutique databasesbull VIPER (Virus Particle ExploreR) treats crystal structures of

icosahedral viruses

bull In the field of immunologybull IMGT the international ImMunoGeneTics database is a

high-quality integrated database specializing in Immunoglobulins (Ig) T-cell receptors (TcR) and Major Histocompatibility Complex (MHC) molecules of all vertebrate species

bull The IMGT server provides a common access to all Immunogenetics data

bull At present it includes two databases IMGTLIGM-DB a comprehensive database of immunoglobulin and T-cell receptor gene sequences from human and other vertebrates with translation for fully annotated sequences and IMGTHLA-DB a database of the human MHC referred to as HLA (Human Leucocyte Antigens)

Web Resource Databases for Specific Protein Families

bull Protein kinasesbull httpwwwsdscedukinasesbull HIV proteasesbull httpwww-fbscncifcrfgovHIVdbbull Icosahedral virusesbull httpmmtsbscrippseduvipermainhtmlbull Immunologybull IGMT httpimgtcinesfrbull KABAT httpimmunobmenwuedubull MHCPEP httpwehihwehieduaumhcpepbull Collections of links to databases on specific protein familiesbull httpwww2ebiacukmsdLinksfamilyshtmlbull KABAT - Database of Sequences of Proteins of Immunological Interest -

North-Western University (USA)bull MHCPEP - Major Histocompatibility Complex Binding Peptides Database -

Walter and Eliza Hall Institute (Melbourne Australia)

Expression and proteomics databases

bull Expression databases record measurements of mRNA levels usually via ESTs (short terminal sequences of cDNA synthesized from mRNA)

Comparisons of expression patterns give clues to (1) the function and mechanism of action of gene products(2) how organisms coordinate their control over metabolic processes in

different conditions - for instance yeast under aerobic or anaerobic conditions

(3) the variations in mobilization of genes at different stages of the cell cycle or of the development of an organism (4) mechanisms of antibiotic resistance in bacteria and consequent

suggestion of targets for drug development (5) the response to challenge by a parasite (6) the response to medications of different types and dosages to

guide effective therapy

  • Archives and Information Retrieval
  • Database indexing and specification of search terms
  • Nucleic acid sequence databases
  • Nucleotide sequence databases
  • Slide 6
  • Slide 7
  • The EMBL data library entry for the bovine pancreatic trypsin inhibitor gene
  • Slide 9
  • FT The feature table may indicate regions that 1 perform or affect function 2 interact with other molecules 3 affect replication 4 are involved in recombination 5 are a repeated unit 6 have secondary or tertiary structure 7 are revised or corrected
  • Protein sequence databases
  • Slide 12
  • PIR entry for the amino acid sequence of Bovine pancreatic trypsin inhibitor
  • Slide 14
  • Slide 15
  • Databases associated with SWISS-PROT
  • A Sample Entry in ENZYME DB
  • The PIR and associated databases
  • Databases of structures
  • Slide 20
  • Protein data bank entry 2TRX E coli thioredoxin
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Indicators of structure quality
  • Crystallisation
  • Determination of protein structure
  • The resolution problem
  • Nuclear Magnetic Resonance (NMR)
  • Classifications of protein structures
  • CATH - A protein domain classification
  • Specialized or boutique databases
  • Web Resource Databases for Specific Protein Families
  • Expression and proteomics databases
Page 7: Archives and Information Retrieval Lecture by; Ms. AQSAD RASHDA BIOINFORMATICS.

Reference Title

Reference Author

Reference Position

Reference Location

Feature Table Header

FT The feature table may indicate regions that1 perform or affect function2 interact with other molecules3 affect replication4 are involved in recombination5 are a repeated unit6 have secondary or tertiary structure7 are revised or corrected

Sequence Header

Protein sequence databases

bull SWISS-PROT The Swiss Institute of Bioinformatics collaborates with the EMBL Data Library to provide an annotated database of amino acid sequences called SWISS-PROT

bull PIR Another protein sequence database is produced by The PIR International

httppirgeorgetownedu

PIR entry for the amino acid sequence of Bovine pancreatic trypsin inhibitor

Databases associated with SWISS-PROTbull ENZYME DB and PROSITEbull The ENZYME DB stores the following information about enzymesbull EC Number a numerical identifier assigned by the Enzyme

Commission (authorized by the International Union of Biochemistry and Molecular Biology see

bull httpwwwchemqmwacukiubmbenzyme)bull Recommended namebull Alternative names if anybull Catalytic activitybull Cofactors if anybull Pointers to SWISS-PROT and other data banksbull Pointers to disease associated with enzyme deficiency if any known

A Sample Entry in ENZYME DB

The PIR and associated databases

bull The PIR maintains several databases about proteins1 PIR-PSD the main protein sequence database2 iProClass classification of proteins according to structure and

function3 ASDB annotation and similarity database each entry is linked to a

list of similar sequences4 PR-NREF a comprehensive non-redundant collection of over 800

000 protein sequences merged from all available sources5 NRL3D a database of sequences and annotations of proteins of

known structure deposited in the Protein Data Bank6 ALN a database of protein sequence alignments and7RESID a database of covalent protein structure modifications (recall

that important structural features of proteins such as disulphide bridges are not inferrable from gene sequences and will not appear in protein sequence databases derived solely by translation of genomic data)

Databases of structuresbull Structure databases archive annotate and distribute sets of atomic

coordinatesbull Protein Data Bank (PDB)bull The information contained includes1 What protein is the subject of the entry and what species it came

from2 Who solved the structure and references to publications describing

the structure determination3 Experimental details about the structure determination including

information related to the general quality of the result such as resolution of an X-ray structure determination and stereochemical statistics

4 The amino acid sequence5 What additional molecules appear in the structure including

cofactors inhibitors and water molecules6 Assignments of secondary structure helix sheet7 Disulphide bridges8 The atomic coordinates

Protein data bank entry 2TRX E coli thioredoxin

Indicators of structure qualitybull X-ray crystal structure analysisbull Nuclear Magnetic Resonancebull Web Resource Protein and Nucleic Acid Structuresbull Home page of protein data bankbull httpwwwrcsborgbull Home page of EBI macromolecular structure databasebull httpmsdebiacukbull Home page of BioMagResBankbull httpwwwbmrbwiscedubull Searching the protein data bankbull Home page of SCOP (Structural classification of proteins)bull httpscopmrc-lmbcamacukscopbull List of browsersbull httppdb-browsersebiacukbrowse_itshtmlbull OCAbull httpocaebiacukoca-binocamain

Crystal

Hanging drop method vapour diffusion method

Microscope slide

2-Concentrated salt solution

1-Dilute protein solutionMicroscope

many differentconditions of 1amp2must be tried

many differentconditions of 1amp2must be tried

Crystallisation

Slide courtesy from Shoshana Wodak

Diffraction pattern Atomic model

Determination of protein structure

Slide courtesy from Shoshana Wodak

A high resolution protein structure 15 - 20 Aring resolution

The resolution problem

Slide courtesy from Shoshana Wodak

Nuclear Magnetic Resonance (NMR)

Source Branden amp Tooze (1991)

Classifications of protein structuresbull SCOP Structural Classification of Proteins

bull CATH ClassArchitectureTopologyHomology

bull DALIClassification of protein domains

Based on extraction of similar structures from distance matrices

[httpwwwebiacukdalidomain]

bull CE A database of structural alignments

Class

Architecture

Topology

Figure from Shoshana Wodak

CATH - A protein domain classification

bull In CATH protein domains are classified according to a tree with 4 levels of hierarchicallyndash Classndash Architecturendash Topologyndash Homology

Specialized or boutique databasesbull VIPER (Virus Particle ExploreR) treats crystal structures of

icosahedral viruses

bull In the field of immunologybull IMGT the international ImMunoGeneTics database is a

high-quality integrated database specializing in Immunoglobulins (Ig) T-cell receptors (TcR) and Major Histocompatibility Complex (MHC) molecules of all vertebrate species

bull The IMGT server provides a common access to all Immunogenetics data

bull At present it includes two databases IMGTLIGM-DB a comprehensive database of immunoglobulin and T-cell receptor gene sequences from human and other vertebrates with translation for fully annotated sequences and IMGTHLA-DB a database of the human MHC referred to as HLA (Human Leucocyte Antigens)

Web Resource Databases for Specific Protein Families

bull Protein kinasesbull httpwwwsdscedukinasesbull HIV proteasesbull httpwww-fbscncifcrfgovHIVdbbull Icosahedral virusesbull httpmmtsbscrippseduvipermainhtmlbull Immunologybull IGMT httpimgtcinesfrbull KABAT httpimmunobmenwuedubull MHCPEP httpwehihwehieduaumhcpepbull Collections of links to databases on specific protein familiesbull httpwww2ebiacukmsdLinksfamilyshtmlbull KABAT - Database of Sequences of Proteins of Immunological Interest -

North-Western University (USA)bull MHCPEP - Major Histocompatibility Complex Binding Peptides Database -

Walter and Eliza Hall Institute (Melbourne Australia)

Expression and proteomics databases

bull Expression databases record measurements of mRNA levels usually via ESTs (short terminal sequences of cDNA synthesized from mRNA)

Comparisons of expression patterns give clues to (1) the function and mechanism of action of gene products(2) how organisms coordinate their control over metabolic processes in

different conditions - for instance yeast under aerobic or anaerobic conditions

(3) the variations in mobilization of genes at different stages of the cell cycle or of the development of an organism (4) mechanisms of antibiotic resistance in bacteria and consequent

suggestion of targets for drug development (5) the response to challenge by a parasite (6) the response to medications of different types and dosages to

guide effective therapy

  • Archives and Information Retrieval
  • Database indexing and specification of search terms
  • Nucleic acid sequence databases
  • Nucleotide sequence databases
  • Slide 6
  • Slide 7
  • The EMBL data library entry for the bovine pancreatic trypsin inhibitor gene
  • Slide 9
  • FT The feature table may indicate regions that 1 perform or affect function 2 interact with other molecules 3 affect replication 4 are involved in recombination 5 are a repeated unit 6 have secondary or tertiary structure 7 are revised or corrected
  • Protein sequence databases
  • Slide 12
  • PIR entry for the amino acid sequence of Bovine pancreatic trypsin inhibitor
  • Slide 14
  • Slide 15
  • Databases associated with SWISS-PROT
  • A Sample Entry in ENZYME DB
  • The PIR and associated databases
  • Databases of structures
  • Slide 20
  • Protein data bank entry 2TRX E coli thioredoxin
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Indicators of structure quality
  • Crystallisation
  • Determination of protein structure
  • The resolution problem
  • Nuclear Magnetic Resonance (NMR)
  • Classifications of protein structures
  • CATH - A protein domain classification
  • Specialized or boutique databases
  • Web Resource Databases for Specific Protein Families
  • Expression and proteomics databases
Page 8: Archives and Information Retrieval Lecture by; Ms. AQSAD RASHDA BIOINFORMATICS.

FT The feature table may indicate regions that1 perform or affect function2 interact with other molecules3 affect replication4 are involved in recombination5 are a repeated unit6 have secondary or tertiary structure7 are revised or corrected

Sequence Header

Protein sequence databases

bull SWISS-PROT The Swiss Institute of Bioinformatics collaborates with the EMBL Data Library to provide an annotated database of amino acid sequences called SWISS-PROT

bull PIR Another protein sequence database is produced by The PIR International

httppirgeorgetownedu

PIR entry for the amino acid sequence of Bovine pancreatic trypsin inhibitor

Databases associated with SWISS-PROTbull ENZYME DB and PROSITEbull The ENZYME DB stores the following information about enzymesbull EC Number a numerical identifier assigned by the Enzyme

Commission (authorized by the International Union of Biochemistry and Molecular Biology see

bull httpwwwchemqmwacukiubmbenzyme)bull Recommended namebull Alternative names if anybull Catalytic activitybull Cofactors if anybull Pointers to SWISS-PROT and other data banksbull Pointers to disease associated with enzyme deficiency if any known

A Sample Entry in ENZYME DB

The PIR and associated databases

bull The PIR maintains several databases about proteins1 PIR-PSD the main protein sequence database2 iProClass classification of proteins according to structure and

function3 ASDB annotation and similarity database each entry is linked to a

list of similar sequences4 PR-NREF a comprehensive non-redundant collection of over 800

000 protein sequences merged from all available sources5 NRL3D a database of sequences and annotations of proteins of

known structure deposited in the Protein Data Bank6 ALN a database of protein sequence alignments and7RESID a database of covalent protein structure modifications (recall

that important structural features of proteins such as disulphide bridges are not inferrable from gene sequences and will not appear in protein sequence databases derived solely by translation of genomic data)

Databases of structuresbull Structure databases archive annotate and distribute sets of atomic

coordinatesbull Protein Data Bank (PDB)bull The information contained includes1 What protein is the subject of the entry and what species it came

from2 Who solved the structure and references to publications describing

the structure determination3 Experimental details about the structure determination including

information related to the general quality of the result such as resolution of an X-ray structure determination and stereochemical statistics

4 The amino acid sequence5 What additional molecules appear in the structure including

cofactors inhibitors and water molecules6 Assignments of secondary structure helix sheet7 Disulphide bridges8 The atomic coordinates

Protein data bank entry 2TRX E coli thioredoxin

Indicators of structure qualitybull X-ray crystal structure analysisbull Nuclear Magnetic Resonancebull Web Resource Protein and Nucleic Acid Structuresbull Home page of protein data bankbull httpwwwrcsborgbull Home page of EBI macromolecular structure databasebull httpmsdebiacukbull Home page of BioMagResBankbull httpwwwbmrbwiscedubull Searching the protein data bankbull Home page of SCOP (Structural classification of proteins)bull httpscopmrc-lmbcamacukscopbull List of browsersbull httppdb-browsersebiacukbrowse_itshtmlbull OCAbull httpocaebiacukoca-binocamain

Crystal

Hanging drop method vapour diffusion method

Microscope slide

2-Concentrated salt solution

1-Dilute protein solutionMicroscope

many differentconditions of 1amp2must be tried

many differentconditions of 1amp2must be tried

Crystallisation

Slide courtesy from Shoshana Wodak

Diffraction pattern Atomic model

Determination of protein structure

Slide courtesy from Shoshana Wodak

A high resolution protein structure 15 - 20 Aring resolution

The resolution problem

Slide courtesy from Shoshana Wodak

Nuclear Magnetic Resonance (NMR)

Source Branden amp Tooze (1991)

Classifications of protein structuresbull SCOP Structural Classification of Proteins

bull CATH ClassArchitectureTopologyHomology

bull DALIClassification of protein domains

Based on extraction of similar structures from distance matrices

[httpwwwebiacukdalidomain]

bull CE A database of structural alignments

Class

Architecture

Topology

Figure from Shoshana Wodak

CATH - A protein domain classification

bull In CATH protein domains are classified according to a tree with 4 levels of hierarchicallyndash Classndash Architecturendash Topologyndash Homology

Specialized or boutique databasesbull VIPER (Virus Particle ExploreR) treats crystal structures of

icosahedral viruses

bull In the field of immunologybull IMGT the international ImMunoGeneTics database is a

high-quality integrated database specializing in Immunoglobulins (Ig) T-cell receptors (TcR) and Major Histocompatibility Complex (MHC) molecules of all vertebrate species

bull The IMGT server provides a common access to all Immunogenetics data

bull At present it includes two databases IMGTLIGM-DB a comprehensive database of immunoglobulin and T-cell receptor gene sequences from human and other vertebrates with translation for fully annotated sequences and IMGTHLA-DB a database of the human MHC referred to as HLA (Human Leucocyte Antigens)

Web Resource Databases for Specific Protein Families

bull Protein kinasesbull httpwwwsdscedukinasesbull HIV proteasesbull httpwww-fbscncifcrfgovHIVdbbull Icosahedral virusesbull httpmmtsbscrippseduvipermainhtmlbull Immunologybull IGMT httpimgtcinesfrbull KABAT httpimmunobmenwuedubull MHCPEP httpwehihwehieduaumhcpepbull Collections of links to databases on specific protein familiesbull httpwww2ebiacukmsdLinksfamilyshtmlbull KABAT - Database of Sequences of Proteins of Immunological Interest -

North-Western University (USA)bull MHCPEP - Major Histocompatibility Complex Binding Peptides Database -

Walter and Eliza Hall Institute (Melbourne Australia)

Expression and proteomics databases

bull Expression databases record measurements of mRNA levels usually via ESTs (short terminal sequences of cDNA synthesized from mRNA)

Comparisons of expression patterns give clues to (1) the function and mechanism of action of gene products(2) how organisms coordinate their control over metabolic processes in

different conditions - for instance yeast under aerobic or anaerobic conditions

(3) the variations in mobilization of genes at different stages of the cell cycle or of the development of an organism (4) mechanisms of antibiotic resistance in bacteria and consequent

suggestion of targets for drug development (5) the response to challenge by a parasite (6) the response to medications of different types and dosages to

guide effective therapy

  • Archives and Information Retrieval
  • Database indexing and specification of search terms
  • Nucleic acid sequence databases
  • Nucleotide sequence databases
  • Slide 6
  • Slide 7
  • The EMBL data library entry for the bovine pancreatic trypsin inhibitor gene
  • Slide 9
  • FT The feature table may indicate regions that 1 perform or affect function 2 interact with other molecules 3 affect replication 4 are involved in recombination 5 are a repeated unit 6 have secondary or tertiary structure 7 are revised or corrected
  • Protein sequence databases
  • Slide 12
  • PIR entry for the amino acid sequence of Bovine pancreatic trypsin inhibitor
  • Slide 14
  • Slide 15
  • Databases associated with SWISS-PROT
  • A Sample Entry in ENZYME DB
  • The PIR and associated databases
  • Databases of structures
  • Slide 20
  • Protein data bank entry 2TRX E coli thioredoxin
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Indicators of structure quality
  • Crystallisation
  • Determination of protein structure
  • The resolution problem
  • Nuclear Magnetic Resonance (NMR)
  • Classifications of protein structures
  • CATH - A protein domain classification
  • Specialized or boutique databases
  • Web Resource Databases for Specific Protein Families
  • Expression and proteomics databases
Page 9: Archives and Information Retrieval Lecture by; Ms. AQSAD RASHDA BIOINFORMATICS.

Protein sequence databases

bull SWISS-PROT The Swiss Institute of Bioinformatics collaborates with the EMBL Data Library to provide an annotated database of amino acid sequences called SWISS-PROT

bull PIR Another protein sequence database is produced by The PIR International

httppirgeorgetownedu

PIR entry for the amino acid sequence of Bovine pancreatic trypsin inhibitor

Databases associated with SWISS-PROTbull ENZYME DB and PROSITEbull The ENZYME DB stores the following information about enzymesbull EC Number a numerical identifier assigned by the Enzyme

Commission (authorized by the International Union of Biochemistry and Molecular Biology see

bull httpwwwchemqmwacukiubmbenzyme)bull Recommended namebull Alternative names if anybull Catalytic activitybull Cofactors if anybull Pointers to SWISS-PROT and other data banksbull Pointers to disease associated with enzyme deficiency if any known

A Sample Entry in ENZYME DB

The PIR and associated databases

bull The PIR maintains several databases about proteins1 PIR-PSD the main protein sequence database2 iProClass classification of proteins according to structure and

function3 ASDB annotation and similarity database each entry is linked to a

list of similar sequences4 PR-NREF a comprehensive non-redundant collection of over 800

000 protein sequences merged from all available sources5 NRL3D a database of sequences and annotations of proteins of

known structure deposited in the Protein Data Bank6 ALN a database of protein sequence alignments and7RESID a database of covalent protein structure modifications (recall

that important structural features of proteins such as disulphide bridges are not inferrable from gene sequences and will not appear in protein sequence databases derived solely by translation of genomic data)

Databases of structuresbull Structure databases archive annotate and distribute sets of atomic

coordinatesbull Protein Data Bank (PDB)bull The information contained includes1 What protein is the subject of the entry and what species it came

from2 Who solved the structure and references to publications describing

the structure determination3 Experimental details about the structure determination including

information related to the general quality of the result such as resolution of an X-ray structure determination and stereochemical statistics

4 The amino acid sequence5 What additional molecules appear in the structure including

cofactors inhibitors and water molecules6 Assignments of secondary structure helix sheet7 Disulphide bridges8 The atomic coordinates

Protein data bank entry 2TRX E coli thioredoxin

Indicators of structure qualitybull X-ray crystal structure analysisbull Nuclear Magnetic Resonancebull Web Resource Protein and Nucleic Acid Structuresbull Home page of protein data bankbull httpwwwrcsborgbull Home page of EBI macromolecular structure databasebull httpmsdebiacukbull Home page of BioMagResBankbull httpwwwbmrbwiscedubull Searching the protein data bankbull Home page of SCOP (Structural classification of proteins)bull httpscopmrc-lmbcamacukscopbull List of browsersbull httppdb-browsersebiacukbrowse_itshtmlbull OCAbull httpocaebiacukoca-binocamain

Crystal

Hanging drop method vapour diffusion method

Microscope slide

2-Concentrated salt solution

1-Dilute protein solutionMicroscope

many differentconditions of 1amp2must be tried

many differentconditions of 1amp2must be tried

Crystallisation

Slide courtesy from Shoshana Wodak

Diffraction pattern Atomic model

Determination of protein structure

Slide courtesy from Shoshana Wodak

A high resolution protein structure 15 - 20 Aring resolution

The resolution problem

Slide courtesy from Shoshana Wodak

Nuclear Magnetic Resonance (NMR)

Source Branden amp Tooze (1991)

Classifications of protein structuresbull SCOP Structural Classification of Proteins

bull CATH ClassArchitectureTopologyHomology

bull DALIClassification of protein domains

Based on extraction of similar structures from distance matrices

[httpwwwebiacukdalidomain]

bull CE A database of structural alignments

Class

Architecture

Topology

Figure from Shoshana Wodak

CATH - A protein domain classification

bull In CATH protein domains are classified according to a tree with 4 levels of hierarchicallyndash Classndash Architecturendash Topologyndash Homology

Specialized or boutique databasesbull VIPER (Virus Particle ExploreR) treats crystal structures of

icosahedral viruses

bull In the field of immunologybull IMGT the international ImMunoGeneTics database is a

high-quality integrated database specializing in Immunoglobulins (Ig) T-cell receptors (TcR) and Major Histocompatibility Complex (MHC) molecules of all vertebrate species

bull The IMGT server provides a common access to all Immunogenetics data

bull At present it includes two databases IMGTLIGM-DB a comprehensive database of immunoglobulin and T-cell receptor gene sequences from human and other vertebrates with translation for fully annotated sequences and IMGTHLA-DB a database of the human MHC referred to as HLA (Human Leucocyte Antigens)

Web Resource Databases for Specific Protein Families

bull Protein kinasesbull httpwwwsdscedukinasesbull HIV proteasesbull httpwww-fbscncifcrfgovHIVdbbull Icosahedral virusesbull httpmmtsbscrippseduvipermainhtmlbull Immunologybull IGMT httpimgtcinesfrbull KABAT httpimmunobmenwuedubull MHCPEP httpwehihwehieduaumhcpepbull Collections of links to databases on specific protein familiesbull httpwww2ebiacukmsdLinksfamilyshtmlbull KABAT - Database of Sequences of Proteins of Immunological Interest -

North-Western University (USA)bull MHCPEP - Major Histocompatibility Complex Binding Peptides Database -

Walter and Eliza Hall Institute (Melbourne Australia)

Expression and proteomics databases

bull Expression databases record measurements of mRNA levels usually via ESTs (short terminal sequences of cDNA synthesized from mRNA)

Comparisons of expression patterns give clues to (1) the function and mechanism of action of gene products(2) how organisms coordinate their control over metabolic processes in

different conditions - for instance yeast under aerobic or anaerobic conditions

(3) the variations in mobilization of genes at different stages of the cell cycle or of the development of an organism (4) mechanisms of antibiotic resistance in bacteria and consequent

suggestion of targets for drug development (5) the response to challenge by a parasite (6) the response to medications of different types and dosages to

guide effective therapy

  • Archives and Information Retrieval
  • Database indexing and specification of search terms
  • Nucleic acid sequence databases
  • Nucleotide sequence databases
  • Slide 6
  • Slide 7
  • The EMBL data library entry for the bovine pancreatic trypsin inhibitor gene
  • Slide 9
  • FT The feature table may indicate regions that 1 perform or affect function 2 interact with other molecules 3 affect replication 4 are involved in recombination 5 are a repeated unit 6 have secondary or tertiary structure 7 are revised or corrected
  • Protein sequence databases
  • Slide 12
  • PIR entry for the amino acid sequence of Bovine pancreatic trypsin inhibitor
  • Slide 14
  • Slide 15
  • Databases associated with SWISS-PROT
  • A Sample Entry in ENZYME DB
  • The PIR and associated databases
  • Databases of structures
  • Slide 20
  • Protein data bank entry 2TRX E coli thioredoxin
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Indicators of structure quality
  • Crystallisation
  • Determination of protein structure
  • The resolution problem
  • Nuclear Magnetic Resonance (NMR)
  • Classifications of protein structures
  • CATH - A protein domain classification
  • Specialized or boutique databases
  • Web Resource Databases for Specific Protein Families
  • Expression and proteomics databases
Page 10: Archives and Information Retrieval Lecture by; Ms. AQSAD RASHDA BIOINFORMATICS.

httppirgeorgetownedu

PIR entry for the amino acid sequence of Bovine pancreatic trypsin inhibitor

Databases associated with SWISS-PROTbull ENZYME DB and PROSITEbull The ENZYME DB stores the following information about enzymesbull EC Number a numerical identifier assigned by the Enzyme

Commission (authorized by the International Union of Biochemistry and Molecular Biology see

bull httpwwwchemqmwacukiubmbenzyme)bull Recommended namebull Alternative names if anybull Catalytic activitybull Cofactors if anybull Pointers to SWISS-PROT and other data banksbull Pointers to disease associated with enzyme deficiency if any known

A Sample Entry in ENZYME DB

The PIR and associated databases

bull The PIR maintains several databases about proteins1 PIR-PSD the main protein sequence database2 iProClass classification of proteins according to structure and

function3 ASDB annotation and similarity database each entry is linked to a

list of similar sequences4 PR-NREF a comprehensive non-redundant collection of over 800

000 protein sequences merged from all available sources5 NRL3D a database of sequences and annotations of proteins of

known structure deposited in the Protein Data Bank6 ALN a database of protein sequence alignments and7RESID a database of covalent protein structure modifications (recall

that important structural features of proteins such as disulphide bridges are not inferrable from gene sequences and will not appear in protein sequence databases derived solely by translation of genomic data)

Databases of structuresbull Structure databases archive annotate and distribute sets of atomic

coordinatesbull Protein Data Bank (PDB)bull The information contained includes1 What protein is the subject of the entry and what species it came

from2 Who solved the structure and references to publications describing

the structure determination3 Experimental details about the structure determination including

information related to the general quality of the result such as resolution of an X-ray structure determination and stereochemical statistics

4 The amino acid sequence5 What additional molecules appear in the structure including

cofactors inhibitors and water molecules6 Assignments of secondary structure helix sheet7 Disulphide bridges8 The atomic coordinates

Protein data bank entry 2TRX E coli thioredoxin

Indicators of structure qualitybull X-ray crystal structure analysisbull Nuclear Magnetic Resonancebull Web Resource Protein and Nucleic Acid Structuresbull Home page of protein data bankbull httpwwwrcsborgbull Home page of EBI macromolecular structure databasebull httpmsdebiacukbull Home page of BioMagResBankbull httpwwwbmrbwiscedubull Searching the protein data bankbull Home page of SCOP (Structural classification of proteins)bull httpscopmrc-lmbcamacukscopbull List of browsersbull httppdb-browsersebiacukbrowse_itshtmlbull OCAbull httpocaebiacukoca-binocamain

Crystal

Hanging drop method vapour diffusion method

Microscope slide

2-Concentrated salt solution

1-Dilute protein solutionMicroscope

many differentconditions of 1amp2must be tried

many differentconditions of 1amp2must be tried

Crystallisation

Slide courtesy from Shoshana Wodak

Diffraction pattern Atomic model

Determination of protein structure

Slide courtesy from Shoshana Wodak

A high resolution protein structure 15 - 20 Aring resolution

The resolution problem

Slide courtesy from Shoshana Wodak

Nuclear Magnetic Resonance (NMR)

Source Branden amp Tooze (1991)

Classifications of protein structuresbull SCOP Structural Classification of Proteins

bull CATH ClassArchitectureTopologyHomology

bull DALIClassification of protein domains

Based on extraction of similar structures from distance matrices

[httpwwwebiacukdalidomain]

bull CE A database of structural alignments

Class

Architecture

Topology

Figure from Shoshana Wodak

CATH - A protein domain classification

bull In CATH protein domains are classified according to a tree with 4 levels of hierarchicallyndash Classndash Architecturendash Topologyndash Homology

Specialized or boutique databasesbull VIPER (Virus Particle ExploreR) treats crystal structures of

icosahedral viruses

bull In the field of immunologybull IMGT the international ImMunoGeneTics database is a

high-quality integrated database specializing in Immunoglobulins (Ig) T-cell receptors (TcR) and Major Histocompatibility Complex (MHC) molecules of all vertebrate species

bull The IMGT server provides a common access to all Immunogenetics data

bull At present it includes two databases IMGTLIGM-DB a comprehensive database of immunoglobulin and T-cell receptor gene sequences from human and other vertebrates with translation for fully annotated sequences and IMGTHLA-DB a database of the human MHC referred to as HLA (Human Leucocyte Antigens)

Web Resource Databases for Specific Protein Families

bull Protein kinasesbull httpwwwsdscedukinasesbull HIV proteasesbull httpwww-fbscncifcrfgovHIVdbbull Icosahedral virusesbull httpmmtsbscrippseduvipermainhtmlbull Immunologybull IGMT httpimgtcinesfrbull KABAT httpimmunobmenwuedubull MHCPEP httpwehihwehieduaumhcpepbull Collections of links to databases on specific protein familiesbull httpwww2ebiacukmsdLinksfamilyshtmlbull KABAT - Database of Sequences of Proteins of Immunological Interest -

North-Western University (USA)bull MHCPEP - Major Histocompatibility Complex Binding Peptides Database -

Walter and Eliza Hall Institute (Melbourne Australia)

Expression and proteomics databases

bull Expression databases record measurements of mRNA levels usually via ESTs (short terminal sequences of cDNA synthesized from mRNA)

Comparisons of expression patterns give clues to (1) the function and mechanism of action of gene products(2) how organisms coordinate their control over metabolic processes in

different conditions - for instance yeast under aerobic or anaerobic conditions

(3) the variations in mobilization of genes at different stages of the cell cycle or of the development of an organism (4) mechanisms of antibiotic resistance in bacteria and consequent

suggestion of targets for drug development (5) the response to challenge by a parasite (6) the response to medications of different types and dosages to

guide effective therapy

  • Archives and Information Retrieval
  • Database indexing and specification of search terms
  • Nucleic acid sequence databases
  • Nucleotide sequence databases
  • Slide 6
  • Slide 7
  • The EMBL data library entry for the bovine pancreatic trypsin inhibitor gene
  • Slide 9
  • FT The feature table may indicate regions that 1 perform or affect function 2 interact with other molecules 3 affect replication 4 are involved in recombination 5 are a repeated unit 6 have secondary or tertiary structure 7 are revised or corrected
  • Protein sequence databases
  • Slide 12
  • PIR entry for the amino acid sequence of Bovine pancreatic trypsin inhibitor
  • Slide 14
  • Slide 15
  • Databases associated with SWISS-PROT
  • A Sample Entry in ENZYME DB
  • The PIR and associated databases
  • Databases of structures
  • Slide 20
  • Protein data bank entry 2TRX E coli thioredoxin
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Indicators of structure quality
  • Crystallisation
  • Determination of protein structure
  • The resolution problem
  • Nuclear Magnetic Resonance (NMR)
  • Classifications of protein structures
  • CATH - A protein domain classification
  • Specialized or boutique databases
  • Web Resource Databases for Specific Protein Families
  • Expression and proteomics databases
Page 11: Archives and Information Retrieval Lecture by; Ms. AQSAD RASHDA BIOINFORMATICS.

PIR entry for the amino acid sequence of Bovine pancreatic trypsin inhibitor

Databases associated with SWISS-PROTbull ENZYME DB and PROSITEbull The ENZYME DB stores the following information about enzymesbull EC Number a numerical identifier assigned by the Enzyme

Commission (authorized by the International Union of Biochemistry and Molecular Biology see

bull httpwwwchemqmwacukiubmbenzyme)bull Recommended namebull Alternative names if anybull Catalytic activitybull Cofactors if anybull Pointers to SWISS-PROT and other data banksbull Pointers to disease associated with enzyme deficiency if any known

A Sample Entry in ENZYME DB

The PIR and associated databases

bull The PIR maintains several databases about proteins1 PIR-PSD the main protein sequence database2 iProClass classification of proteins according to structure and

function3 ASDB annotation and similarity database each entry is linked to a

list of similar sequences4 PR-NREF a comprehensive non-redundant collection of over 800

000 protein sequences merged from all available sources5 NRL3D a database of sequences and annotations of proteins of

known structure deposited in the Protein Data Bank6 ALN a database of protein sequence alignments and7RESID a database of covalent protein structure modifications (recall

that important structural features of proteins such as disulphide bridges are not inferrable from gene sequences and will not appear in protein sequence databases derived solely by translation of genomic data)

Databases of structuresbull Structure databases archive annotate and distribute sets of atomic

coordinatesbull Protein Data Bank (PDB)bull The information contained includes1 What protein is the subject of the entry and what species it came

from2 Who solved the structure and references to publications describing

the structure determination3 Experimental details about the structure determination including

information related to the general quality of the result such as resolution of an X-ray structure determination and stereochemical statistics

4 The amino acid sequence5 What additional molecules appear in the structure including

cofactors inhibitors and water molecules6 Assignments of secondary structure helix sheet7 Disulphide bridges8 The atomic coordinates

Protein data bank entry 2TRX E coli thioredoxin

Indicators of structure qualitybull X-ray crystal structure analysisbull Nuclear Magnetic Resonancebull Web Resource Protein and Nucleic Acid Structuresbull Home page of protein data bankbull httpwwwrcsborgbull Home page of EBI macromolecular structure databasebull httpmsdebiacukbull Home page of BioMagResBankbull httpwwwbmrbwiscedubull Searching the protein data bankbull Home page of SCOP (Structural classification of proteins)bull httpscopmrc-lmbcamacukscopbull List of browsersbull httppdb-browsersebiacukbrowse_itshtmlbull OCAbull httpocaebiacukoca-binocamain

Crystal

Hanging drop method vapour diffusion method

Microscope slide

2-Concentrated salt solution

1-Dilute protein solutionMicroscope

many differentconditions of 1amp2must be tried

many differentconditions of 1amp2must be tried

Crystallisation

Slide courtesy from Shoshana Wodak

Diffraction pattern Atomic model

Determination of protein structure

Slide courtesy from Shoshana Wodak

A high resolution protein structure 15 - 20 Aring resolution

The resolution problem

Slide courtesy from Shoshana Wodak

Nuclear Magnetic Resonance (NMR)

Source Branden amp Tooze (1991)

Classifications of protein structuresbull SCOP Structural Classification of Proteins

bull CATH ClassArchitectureTopologyHomology

bull DALIClassification of protein domains

Based on extraction of similar structures from distance matrices

[httpwwwebiacukdalidomain]

bull CE A database of structural alignments

Class

Architecture

Topology

Figure from Shoshana Wodak

CATH - A protein domain classification

bull In CATH protein domains are classified according to a tree with 4 levels of hierarchicallyndash Classndash Architecturendash Topologyndash Homology

Specialized or boutique databasesbull VIPER (Virus Particle ExploreR) treats crystal structures of

icosahedral viruses

bull In the field of immunologybull IMGT the international ImMunoGeneTics database is a

high-quality integrated database specializing in Immunoglobulins (Ig) T-cell receptors (TcR) and Major Histocompatibility Complex (MHC) molecules of all vertebrate species

bull The IMGT server provides a common access to all Immunogenetics data

bull At present it includes two databases IMGTLIGM-DB a comprehensive database of immunoglobulin and T-cell receptor gene sequences from human and other vertebrates with translation for fully annotated sequences and IMGTHLA-DB a database of the human MHC referred to as HLA (Human Leucocyte Antigens)

Web Resource Databases for Specific Protein Families

bull Protein kinasesbull httpwwwsdscedukinasesbull HIV proteasesbull httpwww-fbscncifcrfgovHIVdbbull Icosahedral virusesbull httpmmtsbscrippseduvipermainhtmlbull Immunologybull IGMT httpimgtcinesfrbull KABAT httpimmunobmenwuedubull MHCPEP httpwehihwehieduaumhcpepbull Collections of links to databases on specific protein familiesbull httpwww2ebiacukmsdLinksfamilyshtmlbull KABAT - Database of Sequences of Proteins of Immunological Interest -

North-Western University (USA)bull MHCPEP - Major Histocompatibility Complex Binding Peptides Database -

Walter and Eliza Hall Institute (Melbourne Australia)

Expression and proteomics databases

bull Expression databases record measurements of mRNA levels usually via ESTs (short terminal sequences of cDNA synthesized from mRNA)

Comparisons of expression patterns give clues to (1) the function and mechanism of action of gene products(2) how organisms coordinate their control over metabolic processes in

different conditions - for instance yeast under aerobic or anaerobic conditions

(3) the variations in mobilization of genes at different stages of the cell cycle or of the development of an organism (4) mechanisms of antibiotic resistance in bacteria and consequent

suggestion of targets for drug development (5) the response to challenge by a parasite (6) the response to medications of different types and dosages to

guide effective therapy

  • Archives and Information Retrieval
  • Database indexing and specification of search terms
  • Nucleic acid sequence databases
  • Nucleotide sequence databases
  • Slide 6
  • Slide 7
  • The EMBL data library entry for the bovine pancreatic trypsin inhibitor gene
  • Slide 9
  • FT The feature table may indicate regions that 1 perform or affect function 2 interact with other molecules 3 affect replication 4 are involved in recombination 5 are a repeated unit 6 have secondary or tertiary structure 7 are revised or corrected
  • Protein sequence databases
  • Slide 12
  • PIR entry for the amino acid sequence of Bovine pancreatic trypsin inhibitor
  • Slide 14
  • Slide 15
  • Databases associated with SWISS-PROT
  • A Sample Entry in ENZYME DB
  • The PIR and associated databases
  • Databases of structures
  • Slide 20
  • Protein data bank entry 2TRX E coli thioredoxin
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Indicators of structure quality
  • Crystallisation
  • Determination of protein structure
  • The resolution problem
  • Nuclear Magnetic Resonance (NMR)
  • Classifications of protein structures
  • CATH - A protein domain classification
  • Specialized or boutique databases
  • Web Resource Databases for Specific Protein Families
  • Expression and proteomics databases
Page 12: Archives and Information Retrieval Lecture by; Ms. AQSAD RASHDA BIOINFORMATICS.

Databases associated with SWISS-PROTbull ENZYME DB and PROSITEbull The ENZYME DB stores the following information about enzymesbull EC Number a numerical identifier assigned by the Enzyme

Commission (authorized by the International Union of Biochemistry and Molecular Biology see

bull httpwwwchemqmwacukiubmbenzyme)bull Recommended namebull Alternative names if anybull Catalytic activitybull Cofactors if anybull Pointers to SWISS-PROT and other data banksbull Pointers to disease associated with enzyme deficiency if any known

A Sample Entry in ENZYME DB

The PIR and associated databases

bull The PIR maintains several databases about proteins1 PIR-PSD the main protein sequence database2 iProClass classification of proteins according to structure and

function3 ASDB annotation and similarity database each entry is linked to a

list of similar sequences4 PR-NREF a comprehensive non-redundant collection of over 800

000 protein sequences merged from all available sources5 NRL3D a database of sequences and annotations of proteins of

known structure deposited in the Protein Data Bank6 ALN a database of protein sequence alignments and7RESID a database of covalent protein structure modifications (recall

that important structural features of proteins such as disulphide bridges are not inferrable from gene sequences and will not appear in protein sequence databases derived solely by translation of genomic data)

Databases of structuresbull Structure databases archive annotate and distribute sets of atomic

coordinatesbull Protein Data Bank (PDB)bull The information contained includes1 What protein is the subject of the entry and what species it came

from2 Who solved the structure and references to publications describing

the structure determination3 Experimental details about the structure determination including

information related to the general quality of the result such as resolution of an X-ray structure determination and stereochemical statistics

4 The amino acid sequence5 What additional molecules appear in the structure including

cofactors inhibitors and water molecules6 Assignments of secondary structure helix sheet7 Disulphide bridges8 The atomic coordinates

Protein data bank entry 2TRX E coli thioredoxin

Indicators of structure qualitybull X-ray crystal structure analysisbull Nuclear Magnetic Resonancebull Web Resource Protein and Nucleic Acid Structuresbull Home page of protein data bankbull httpwwwrcsborgbull Home page of EBI macromolecular structure databasebull httpmsdebiacukbull Home page of BioMagResBankbull httpwwwbmrbwiscedubull Searching the protein data bankbull Home page of SCOP (Structural classification of proteins)bull httpscopmrc-lmbcamacukscopbull List of browsersbull httppdb-browsersebiacukbrowse_itshtmlbull OCAbull httpocaebiacukoca-binocamain

Crystal

Hanging drop method vapour diffusion method

Microscope slide

2-Concentrated salt solution

1-Dilute protein solutionMicroscope

many differentconditions of 1amp2must be tried

many differentconditions of 1amp2must be tried

Crystallisation

Slide courtesy from Shoshana Wodak

Diffraction pattern Atomic model

Determination of protein structure

Slide courtesy from Shoshana Wodak

A high resolution protein structure 15 - 20 Aring resolution

The resolution problem

Slide courtesy from Shoshana Wodak

Nuclear Magnetic Resonance (NMR)

Source Branden amp Tooze (1991)

Classifications of protein structuresbull SCOP Structural Classification of Proteins

bull CATH ClassArchitectureTopologyHomology

bull DALIClassification of protein domains

Based on extraction of similar structures from distance matrices

[httpwwwebiacukdalidomain]

bull CE A database of structural alignments

Class

Architecture

Topology

Figure from Shoshana Wodak

CATH - A protein domain classification

bull In CATH protein domains are classified according to a tree with 4 levels of hierarchicallyndash Classndash Architecturendash Topologyndash Homology

Specialized or boutique databasesbull VIPER (Virus Particle ExploreR) treats crystal structures of

icosahedral viruses

bull In the field of immunologybull IMGT the international ImMunoGeneTics database is a

high-quality integrated database specializing in Immunoglobulins (Ig) T-cell receptors (TcR) and Major Histocompatibility Complex (MHC) molecules of all vertebrate species

bull The IMGT server provides a common access to all Immunogenetics data

bull At present it includes two databases IMGTLIGM-DB a comprehensive database of immunoglobulin and T-cell receptor gene sequences from human and other vertebrates with translation for fully annotated sequences and IMGTHLA-DB a database of the human MHC referred to as HLA (Human Leucocyte Antigens)

Web Resource Databases for Specific Protein Families

bull Protein kinasesbull httpwwwsdscedukinasesbull HIV proteasesbull httpwww-fbscncifcrfgovHIVdbbull Icosahedral virusesbull httpmmtsbscrippseduvipermainhtmlbull Immunologybull IGMT httpimgtcinesfrbull KABAT httpimmunobmenwuedubull MHCPEP httpwehihwehieduaumhcpepbull Collections of links to databases on specific protein familiesbull httpwww2ebiacukmsdLinksfamilyshtmlbull KABAT - Database of Sequences of Proteins of Immunological Interest -

North-Western University (USA)bull MHCPEP - Major Histocompatibility Complex Binding Peptides Database -

Walter and Eliza Hall Institute (Melbourne Australia)

Expression and proteomics databases

bull Expression databases record measurements of mRNA levels usually via ESTs (short terminal sequences of cDNA synthesized from mRNA)

Comparisons of expression patterns give clues to (1) the function and mechanism of action of gene products(2) how organisms coordinate their control over metabolic processes in

different conditions - for instance yeast under aerobic or anaerobic conditions

(3) the variations in mobilization of genes at different stages of the cell cycle or of the development of an organism (4) mechanisms of antibiotic resistance in bacteria and consequent

suggestion of targets for drug development (5) the response to challenge by a parasite (6) the response to medications of different types and dosages to

guide effective therapy

  • Archives and Information Retrieval
  • Database indexing and specification of search terms
  • Nucleic acid sequence databases
  • Nucleotide sequence databases
  • Slide 6
  • Slide 7
  • The EMBL data library entry for the bovine pancreatic trypsin inhibitor gene
  • Slide 9
  • FT The feature table may indicate regions that 1 perform or affect function 2 interact with other molecules 3 affect replication 4 are involved in recombination 5 are a repeated unit 6 have secondary or tertiary structure 7 are revised or corrected
  • Protein sequence databases
  • Slide 12
  • PIR entry for the amino acid sequence of Bovine pancreatic trypsin inhibitor
  • Slide 14
  • Slide 15
  • Databases associated with SWISS-PROT
  • A Sample Entry in ENZYME DB
  • The PIR and associated databases
  • Databases of structures
  • Slide 20
  • Protein data bank entry 2TRX E coli thioredoxin
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Indicators of structure quality
  • Crystallisation
  • Determination of protein structure
  • The resolution problem
  • Nuclear Magnetic Resonance (NMR)
  • Classifications of protein structures
  • CATH - A protein domain classification
  • Specialized or boutique databases
  • Web Resource Databases for Specific Protein Families
  • Expression and proteomics databases
Page 13: Archives and Information Retrieval Lecture by; Ms. AQSAD RASHDA BIOINFORMATICS.

A Sample Entry in ENZYME DB

The PIR and associated databases

bull The PIR maintains several databases about proteins1 PIR-PSD the main protein sequence database2 iProClass classification of proteins according to structure and

function3 ASDB annotation and similarity database each entry is linked to a

list of similar sequences4 PR-NREF a comprehensive non-redundant collection of over 800

000 protein sequences merged from all available sources5 NRL3D a database of sequences and annotations of proteins of

known structure deposited in the Protein Data Bank6 ALN a database of protein sequence alignments and7RESID a database of covalent protein structure modifications (recall

that important structural features of proteins such as disulphide bridges are not inferrable from gene sequences and will not appear in protein sequence databases derived solely by translation of genomic data)

Databases of structuresbull Structure databases archive annotate and distribute sets of atomic

coordinatesbull Protein Data Bank (PDB)bull The information contained includes1 What protein is the subject of the entry and what species it came

from2 Who solved the structure and references to publications describing

the structure determination3 Experimental details about the structure determination including

information related to the general quality of the result such as resolution of an X-ray structure determination and stereochemical statistics

4 The amino acid sequence5 What additional molecules appear in the structure including

cofactors inhibitors and water molecules6 Assignments of secondary structure helix sheet7 Disulphide bridges8 The atomic coordinates

Protein data bank entry 2TRX E coli thioredoxin

Indicators of structure qualitybull X-ray crystal structure analysisbull Nuclear Magnetic Resonancebull Web Resource Protein and Nucleic Acid Structuresbull Home page of protein data bankbull httpwwwrcsborgbull Home page of EBI macromolecular structure databasebull httpmsdebiacukbull Home page of BioMagResBankbull httpwwwbmrbwiscedubull Searching the protein data bankbull Home page of SCOP (Structural classification of proteins)bull httpscopmrc-lmbcamacukscopbull List of browsersbull httppdb-browsersebiacukbrowse_itshtmlbull OCAbull httpocaebiacukoca-binocamain

Crystal

Hanging drop method vapour diffusion method

Microscope slide

2-Concentrated salt solution

1-Dilute protein solutionMicroscope

many differentconditions of 1amp2must be tried

many differentconditions of 1amp2must be tried

Crystallisation

Slide courtesy from Shoshana Wodak

Diffraction pattern Atomic model

Determination of protein structure

Slide courtesy from Shoshana Wodak

A high resolution protein structure 15 - 20 Aring resolution

The resolution problem

Slide courtesy from Shoshana Wodak

Nuclear Magnetic Resonance (NMR)

Source Branden amp Tooze (1991)

Classifications of protein structuresbull SCOP Structural Classification of Proteins

bull CATH ClassArchitectureTopologyHomology

bull DALIClassification of protein domains

Based on extraction of similar structures from distance matrices

[httpwwwebiacukdalidomain]

bull CE A database of structural alignments

Class

Architecture

Topology

Figure from Shoshana Wodak

CATH - A protein domain classification

bull In CATH protein domains are classified according to a tree with 4 levels of hierarchicallyndash Classndash Architecturendash Topologyndash Homology

Specialized or boutique databasesbull VIPER (Virus Particle ExploreR) treats crystal structures of

icosahedral viruses

bull In the field of immunologybull IMGT the international ImMunoGeneTics database is a

high-quality integrated database specializing in Immunoglobulins (Ig) T-cell receptors (TcR) and Major Histocompatibility Complex (MHC) molecules of all vertebrate species

bull The IMGT server provides a common access to all Immunogenetics data

bull At present it includes two databases IMGTLIGM-DB a comprehensive database of immunoglobulin and T-cell receptor gene sequences from human and other vertebrates with translation for fully annotated sequences and IMGTHLA-DB a database of the human MHC referred to as HLA (Human Leucocyte Antigens)

Web Resource Databases for Specific Protein Families

bull Protein kinasesbull httpwwwsdscedukinasesbull HIV proteasesbull httpwww-fbscncifcrfgovHIVdbbull Icosahedral virusesbull httpmmtsbscrippseduvipermainhtmlbull Immunologybull IGMT httpimgtcinesfrbull KABAT httpimmunobmenwuedubull MHCPEP httpwehihwehieduaumhcpepbull Collections of links to databases on specific protein familiesbull httpwww2ebiacukmsdLinksfamilyshtmlbull KABAT - Database of Sequences of Proteins of Immunological Interest -

North-Western University (USA)bull MHCPEP - Major Histocompatibility Complex Binding Peptides Database -

Walter and Eliza Hall Institute (Melbourne Australia)

Expression and proteomics databases

bull Expression databases record measurements of mRNA levels usually via ESTs (short terminal sequences of cDNA synthesized from mRNA)

Comparisons of expression patterns give clues to (1) the function and mechanism of action of gene products(2) how organisms coordinate their control over metabolic processes in

different conditions - for instance yeast under aerobic or anaerobic conditions

(3) the variations in mobilization of genes at different stages of the cell cycle or of the development of an organism (4) mechanisms of antibiotic resistance in bacteria and consequent

suggestion of targets for drug development (5) the response to challenge by a parasite (6) the response to medications of different types and dosages to

guide effective therapy

  • Archives and Information Retrieval
  • Database indexing and specification of search terms
  • Nucleic acid sequence databases
  • Nucleotide sequence databases
  • Slide 6
  • Slide 7
  • The EMBL data library entry for the bovine pancreatic trypsin inhibitor gene
  • Slide 9
  • FT The feature table may indicate regions that 1 perform or affect function 2 interact with other molecules 3 affect replication 4 are involved in recombination 5 are a repeated unit 6 have secondary or tertiary structure 7 are revised or corrected
  • Protein sequence databases
  • Slide 12
  • PIR entry for the amino acid sequence of Bovine pancreatic trypsin inhibitor
  • Slide 14
  • Slide 15
  • Databases associated with SWISS-PROT
  • A Sample Entry in ENZYME DB
  • The PIR and associated databases
  • Databases of structures
  • Slide 20
  • Protein data bank entry 2TRX E coli thioredoxin
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Indicators of structure quality
  • Crystallisation
  • Determination of protein structure
  • The resolution problem
  • Nuclear Magnetic Resonance (NMR)
  • Classifications of protein structures
  • CATH - A protein domain classification
  • Specialized or boutique databases
  • Web Resource Databases for Specific Protein Families
  • Expression and proteomics databases
Page 14: Archives and Information Retrieval Lecture by; Ms. AQSAD RASHDA BIOINFORMATICS.

The PIR and associated databases

bull The PIR maintains several databases about proteins1 PIR-PSD the main protein sequence database2 iProClass classification of proteins according to structure and

function3 ASDB annotation and similarity database each entry is linked to a

list of similar sequences4 PR-NREF a comprehensive non-redundant collection of over 800

000 protein sequences merged from all available sources5 NRL3D a database of sequences and annotations of proteins of

known structure deposited in the Protein Data Bank6 ALN a database of protein sequence alignments and7RESID a database of covalent protein structure modifications (recall

that important structural features of proteins such as disulphide bridges are not inferrable from gene sequences and will not appear in protein sequence databases derived solely by translation of genomic data)

Databases of structuresbull Structure databases archive annotate and distribute sets of atomic

coordinatesbull Protein Data Bank (PDB)bull The information contained includes1 What protein is the subject of the entry and what species it came

from2 Who solved the structure and references to publications describing

the structure determination3 Experimental details about the structure determination including

information related to the general quality of the result such as resolution of an X-ray structure determination and stereochemical statistics

4 The amino acid sequence5 What additional molecules appear in the structure including

cofactors inhibitors and water molecules6 Assignments of secondary structure helix sheet7 Disulphide bridges8 The atomic coordinates

Protein data bank entry 2TRX E coli thioredoxin

Indicators of structure qualitybull X-ray crystal structure analysisbull Nuclear Magnetic Resonancebull Web Resource Protein and Nucleic Acid Structuresbull Home page of protein data bankbull httpwwwrcsborgbull Home page of EBI macromolecular structure databasebull httpmsdebiacukbull Home page of BioMagResBankbull httpwwwbmrbwiscedubull Searching the protein data bankbull Home page of SCOP (Structural classification of proteins)bull httpscopmrc-lmbcamacukscopbull List of browsersbull httppdb-browsersebiacukbrowse_itshtmlbull OCAbull httpocaebiacukoca-binocamain

Crystal

Hanging drop method vapour diffusion method

Microscope slide

2-Concentrated salt solution

1-Dilute protein solutionMicroscope

many differentconditions of 1amp2must be tried

many differentconditions of 1amp2must be tried

Crystallisation

Slide courtesy from Shoshana Wodak

Diffraction pattern Atomic model

Determination of protein structure

Slide courtesy from Shoshana Wodak

A high resolution protein structure 15 - 20 Aring resolution

The resolution problem

Slide courtesy from Shoshana Wodak

Nuclear Magnetic Resonance (NMR)

Source Branden amp Tooze (1991)

Classifications of protein structuresbull SCOP Structural Classification of Proteins

bull CATH ClassArchitectureTopologyHomology

bull DALIClassification of protein domains

Based on extraction of similar structures from distance matrices

[httpwwwebiacukdalidomain]

bull CE A database of structural alignments

Class

Architecture

Topology

Figure from Shoshana Wodak

CATH - A protein domain classification

bull In CATH protein domains are classified according to a tree with 4 levels of hierarchicallyndash Classndash Architecturendash Topologyndash Homology

Specialized or boutique databasesbull VIPER (Virus Particle ExploreR) treats crystal structures of

icosahedral viruses

bull In the field of immunologybull IMGT the international ImMunoGeneTics database is a

high-quality integrated database specializing in Immunoglobulins (Ig) T-cell receptors (TcR) and Major Histocompatibility Complex (MHC) molecules of all vertebrate species

bull The IMGT server provides a common access to all Immunogenetics data

bull At present it includes two databases IMGTLIGM-DB a comprehensive database of immunoglobulin and T-cell receptor gene sequences from human and other vertebrates with translation for fully annotated sequences and IMGTHLA-DB a database of the human MHC referred to as HLA (Human Leucocyte Antigens)

Web Resource Databases for Specific Protein Families

bull Protein kinasesbull httpwwwsdscedukinasesbull HIV proteasesbull httpwww-fbscncifcrfgovHIVdbbull Icosahedral virusesbull httpmmtsbscrippseduvipermainhtmlbull Immunologybull IGMT httpimgtcinesfrbull KABAT httpimmunobmenwuedubull MHCPEP httpwehihwehieduaumhcpepbull Collections of links to databases on specific protein familiesbull httpwww2ebiacukmsdLinksfamilyshtmlbull KABAT - Database of Sequences of Proteins of Immunological Interest -

North-Western University (USA)bull MHCPEP - Major Histocompatibility Complex Binding Peptides Database -

Walter and Eliza Hall Institute (Melbourne Australia)

Expression and proteomics databases

bull Expression databases record measurements of mRNA levels usually via ESTs (short terminal sequences of cDNA synthesized from mRNA)

Comparisons of expression patterns give clues to (1) the function and mechanism of action of gene products(2) how organisms coordinate their control over metabolic processes in

different conditions - for instance yeast under aerobic or anaerobic conditions

(3) the variations in mobilization of genes at different stages of the cell cycle or of the development of an organism (4) mechanisms of antibiotic resistance in bacteria and consequent

suggestion of targets for drug development (5) the response to challenge by a parasite (6) the response to medications of different types and dosages to

guide effective therapy

  • Archives and Information Retrieval
  • Database indexing and specification of search terms
  • Nucleic acid sequence databases
  • Nucleotide sequence databases
  • Slide 6
  • Slide 7
  • The EMBL data library entry for the bovine pancreatic trypsin inhibitor gene
  • Slide 9
  • FT The feature table may indicate regions that 1 perform or affect function 2 interact with other molecules 3 affect replication 4 are involved in recombination 5 are a repeated unit 6 have secondary or tertiary structure 7 are revised or corrected
  • Protein sequence databases
  • Slide 12
  • PIR entry for the amino acid sequence of Bovine pancreatic trypsin inhibitor
  • Slide 14
  • Slide 15
  • Databases associated with SWISS-PROT
  • A Sample Entry in ENZYME DB
  • The PIR and associated databases
  • Databases of structures
  • Slide 20
  • Protein data bank entry 2TRX E coli thioredoxin
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Indicators of structure quality
  • Crystallisation
  • Determination of protein structure
  • The resolution problem
  • Nuclear Magnetic Resonance (NMR)
  • Classifications of protein structures
  • CATH - A protein domain classification
  • Specialized or boutique databases
  • Web Resource Databases for Specific Protein Families
  • Expression and proteomics databases
Page 15: Archives and Information Retrieval Lecture by; Ms. AQSAD RASHDA BIOINFORMATICS.

Databases of structuresbull Structure databases archive annotate and distribute sets of atomic

coordinatesbull Protein Data Bank (PDB)bull The information contained includes1 What protein is the subject of the entry and what species it came

from2 Who solved the structure and references to publications describing

the structure determination3 Experimental details about the structure determination including

information related to the general quality of the result such as resolution of an X-ray structure determination and stereochemical statistics

4 The amino acid sequence5 What additional molecules appear in the structure including

cofactors inhibitors and water molecules6 Assignments of secondary structure helix sheet7 Disulphide bridges8 The atomic coordinates

Protein data bank entry 2TRX E coli thioredoxin

Indicators of structure qualitybull X-ray crystal structure analysisbull Nuclear Magnetic Resonancebull Web Resource Protein and Nucleic Acid Structuresbull Home page of protein data bankbull httpwwwrcsborgbull Home page of EBI macromolecular structure databasebull httpmsdebiacukbull Home page of BioMagResBankbull httpwwwbmrbwiscedubull Searching the protein data bankbull Home page of SCOP (Structural classification of proteins)bull httpscopmrc-lmbcamacukscopbull List of browsersbull httppdb-browsersebiacukbrowse_itshtmlbull OCAbull httpocaebiacukoca-binocamain

Crystal

Hanging drop method vapour diffusion method

Microscope slide

2-Concentrated salt solution

1-Dilute protein solutionMicroscope

many differentconditions of 1amp2must be tried

many differentconditions of 1amp2must be tried

Crystallisation

Slide courtesy from Shoshana Wodak

Diffraction pattern Atomic model

Determination of protein structure

Slide courtesy from Shoshana Wodak

A high resolution protein structure 15 - 20 Aring resolution

The resolution problem

Slide courtesy from Shoshana Wodak

Nuclear Magnetic Resonance (NMR)

Source Branden amp Tooze (1991)

Classifications of protein structuresbull SCOP Structural Classification of Proteins

bull CATH ClassArchitectureTopologyHomology

bull DALIClassification of protein domains

Based on extraction of similar structures from distance matrices

[httpwwwebiacukdalidomain]

bull CE A database of structural alignments

Class

Architecture

Topology

Figure from Shoshana Wodak

CATH - A protein domain classification

bull In CATH protein domains are classified according to a tree with 4 levels of hierarchicallyndash Classndash Architecturendash Topologyndash Homology

Specialized or boutique databasesbull VIPER (Virus Particle ExploreR) treats crystal structures of

icosahedral viruses

bull In the field of immunologybull IMGT the international ImMunoGeneTics database is a

high-quality integrated database specializing in Immunoglobulins (Ig) T-cell receptors (TcR) and Major Histocompatibility Complex (MHC) molecules of all vertebrate species

bull The IMGT server provides a common access to all Immunogenetics data

bull At present it includes two databases IMGTLIGM-DB a comprehensive database of immunoglobulin and T-cell receptor gene sequences from human and other vertebrates with translation for fully annotated sequences and IMGTHLA-DB a database of the human MHC referred to as HLA (Human Leucocyte Antigens)

Web Resource Databases for Specific Protein Families

bull Protein kinasesbull httpwwwsdscedukinasesbull HIV proteasesbull httpwww-fbscncifcrfgovHIVdbbull Icosahedral virusesbull httpmmtsbscrippseduvipermainhtmlbull Immunologybull IGMT httpimgtcinesfrbull KABAT httpimmunobmenwuedubull MHCPEP httpwehihwehieduaumhcpepbull Collections of links to databases on specific protein familiesbull httpwww2ebiacukmsdLinksfamilyshtmlbull KABAT - Database of Sequences of Proteins of Immunological Interest -

North-Western University (USA)bull MHCPEP - Major Histocompatibility Complex Binding Peptides Database -

Walter and Eliza Hall Institute (Melbourne Australia)

Expression and proteomics databases

bull Expression databases record measurements of mRNA levels usually via ESTs (short terminal sequences of cDNA synthesized from mRNA)

Comparisons of expression patterns give clues to (1) the function and mechanism of action of gene products(2) how organisms coordinate their control over metabolic processes in

different conditions - for instance yeast under aerobic or anaerobic conditions

(3) the variations in mobilization of genes at different stages of the cell cycle or of the development of an organism (4) mechanisms of antibiotic resistance in bacteria and consequent

suggestion of targets for drug development (5) the response to challenge by a parasite (6) the response to medications of different types and dosages to

guide effective therapy

  • Archives and Information Retrieval
  • Database indexing and specification of search terms
  • Nucleic acid sequence databases
  • Nucleotide sequence databases
  • Slide 6
  • Slide 7
  • The EMBL data library entry for the bovine pancreatic trypsin inhibitor gene
  • Slide 9
  • FT The feature table may indicate regions that 1 perform or affect function 2 interact with other molecules 3 affect replication 4 are involved in recombination 5 are a repeated unit 6 have secondary or tertiary structure 7 are revised or corrected
  • Protein sequence databases
  • Slide 12
  • PIR entry for the amino acid sequence of Bovine pancreatic trypsin inhibitor
  • Slide 14
  • Slide 15
  • Databases associated with SWISS-PROT
  • A Sample Entry in ENZYME DB
  • The PIR and associated databases
  • Databases of structures
  • Slide 20
  • Protein data bank entry 2TRX E coli thioredoxin
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Indicators of structure quality
  • Crystallisation
  • Determination of protein structure
  • The resolution problem
  • Nuclear Magnetic Resonance (NMR)
  • Classifications of protein structures
  • CATH - A protein domain classification
  • Specialized or boutique databases
  • Web Resource Databases for Specific Protein Families
  • Expression and proteomics databases
Page 16: Archives and Information Retrieval Lecture by; Ms. AQSAD RASHDA BIOINFORMATICS.

Protein data bank entry 2TRX E coli thioredoxin

Indicators of structure qualitybull X-ray crystal structure analysisbull Nuclear Magnetic Resonancebull Web Resource Protein and Nucleic Acid Structuresbull Home page of protein data bankbull httpwwwrcsborgbull Home page of EBI macromolecular structure databasebull httpmsdebiacukbull Home page of BioMagResBankbull httpwwwbmrbwiscedubull Searching the protein data bankbull Home page of SCOP (Structural classification of proteins)bull httpscopmrc-lmbcamacukscopbull List of browsersbull httppdb-browsersebiacukbrowse_itshtmlbull OCAbull httpocaebiacukoca-binocamain

Crystal

Hanging drop method vapour diffusion method

Microscope slide

2-Concentrated salt solution

1-Dilute protein solutionMicroscope

many differentconditions of 1amp2must be tried

many differentconditions of 1amp2must be tried

Crystallisation

Slide courtesy from Shoshana Wodak

Diffraction pattern Atomic model

Determination of protein structure

Slide courtesy from Shoshana Wodak

A high resolution protein structure 15 - 20 Aring resolution

The resolution problem

Slide courtesy from Shoshana Wodak

Nuclear Magnetic Resonance (NMR)

Source Branden amp Tooze (1991)

Classifications of protein structuresbull SCOP Structural Classification of Proteins

bull CATH ClassArchitectureTopologyHomology

bull DALIClassification of protein domains

Based on extraction of similar structures from distance matrices

[httpwwwebiacukdalidomain]

bull CE A database of structural alignments

Class

Architecture

Topology

Figure from Shoshana Wodak

CATH - A protein domain classification

bull In CATH protein domains are classified according to a tree with 4 levels of hierarchicallyndash Classndash Architecturendash Topologyndash Homology

Specialized or boutique databasesbull VIPER (Virus Particle ExploreR) treats crystal structures of

icosahedral viruses

bull In the field of immunologybull IMGT the international ImMunoGeneTics database is a

high-quality integrated database specializing in Immunoglobulins (Ig) T-cell receptors (TcR) and Major Histocompatibility Complex (MHC) molecules of all vertebrate species

bull The IMGT server provides a common access to all Immunogenetics data

bull At present it includes two databases IMGTLIGM-DB a comprehensive database of immunoglobulin and T-cell receptor gene sequences from human and other vertebrates with translation for fully annotated sequences and IMGTHLA-DB a database of the human MHC referred to as HLA (Human Leucocyte Antigens)

Web Resource Databases for Specific Protein Families

bull Protein kinasesbull httpwwwsdscedukinasesbull HIV proteasesbull httpwww-fbscncifcrfgovHIVdbbull Icosahedral virusesbull httpmmtsbscrippseduvipermainhtmlbull Immunologybull IGMT httpimgtcinesfrbull KABAT httpimmunobmenwuedubull MHCPEP httpwehihwehieduaumhcpepbull Collections of links to databases on specific protein familiesbull httpwww2ebiacukmsdLinksfamilyshtmlbull KABAT - Database of Sequences of Proteins of Immunological Interest -

North-Western University (USA)bull MHCPEP - Major Histocompatibility Complex Binding Peptides Database -

Walter and Eliza Hall Institute (Melbourne Australia)

Expression and proteomics databases

bull Expression databases record measurements of mRNA levels usually via ESTs (short terminal sequences of cDNA synthesized from mRNA)

Comparisons of expression patterns give clues to (1) the function and mechanism of action of gene products(2) how organisms coordinate their control over metabolic processes in

different conditions - for instance yeast under aerobic or anaerobic conditions

(3) the variations in mobilization of genes at different stages of the cell cycle or of the development of an organism (4) mechanisms of antibiotic resistance in bacteria and consequent

suggestion of targets for drug development (5) the response to challenge by a parasite (6) the response to medications of different types and dosages to

guide effective therapy

  • Archives and Information Retrieval
  • Database indexing and specification of search terms
  • Nucleic acid sequence databases
  • Nucleotide sequence databases
  • Slide 6
  • Slide 7
  • The EMBL data library entry for the bovine pancreatic trypsin inhibitor gene
  • Slide 9
  • FT The feature table may indicate regions that 1 perform or affect function 2 interact with other molecules 3 affect replication 4 are involved in recombination 5 are a repeated unit 6 have secondary or tertiary structure 7 are revised or corrected
  • Protein sequence databases
  • Slide 12
  • PIR entry for the amino acid sequence of Bovine pancreatic trypsin inhibitor
  • Slide 14
  • Slide 15
  • Databases associated with SWISS-PROT
  • A Sample Entry in ENZYME DB
  • The PIR and associated databases
  • Databases of structures
  • Slide 20
  • Protein data bank entry 2TRX E coli thioredoxin
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Indicators of structure quality
  • Crystallisation
  • Determination of protein structure
  • The resolution problem
  • Nuclear Magnetic Resonance (NMR)
  • Classifications of protein structures
  • CATH - A protein domain classification
  • Specialized or boutique databases
  • Web Resource Databases for Specific Protein Families
  • Expression and proteomics databases
Page 17: Archives and Information Retrieval Lecture by; Ms. AQSAD RASHDA BIOINFORMATICS.

Indicators of structure qualitybull X-ray crystal structure analysisbull Nuclear Magnetic Resonancebull Web Resource Protein and Nucleic Acid Structuresbull Home page of protein data bankbull httpwwwrcsborgbull Home page of EBI macromolecular structure databasebull httpmsdebiacukbull Home page of BioMagResBankbull httpwwwbmrbwiscedubull Searching the protein data bankbull Home page of SCOP (Structural classification of proteins)bull httpscopmrc-lmbcamacukscopbull List of browsersbull httppdb-browsersebiacukbrowse_itshtmlbull OCAbull httpocaebiacukoca-binocamain

Crystal

Hanging drop method vapour diffusion method

Microscope slide

2-Concentrated salt solution

1-Dilute protein solutionMicroscope

many differentconditions of 1amp2must be tried

many differentconditions of 1amp2must be tried

Crystallisation

Slide courtesy from Shoshana Wodak

Diffraction pattern Atomic model

Determination of protein structure

Slide courtesy from Shoshana Wodak

A high resolution protein structure 15 - 20 Aring resolution

The resolution problem

Slide courtesy from Shoshana Wodak

Nuclear Magnetic Resonance (NMR)

Source Branden amp Tooze (1991)

Classifications of protein structuresbull SCOP Structural Classification of Proteins

bull CATH ClassArchitectureTopologyHomology

bull DALIClassification of protein domains

Based on extraction of similar structures from distance matrices

[httpwwwebiacukdalidomain]

bull CE A database of structural alignments

Class

Architecture

Topology

Figure from Shoshana Wodak

CATH - A protein domain classification

bull In CATH protein domains are classified according to a tree with 4 levels of hierarchicallyndash Classndash Architecturendash Topologyndash Homology

Specialized or boutique databasesbull VIPER (Virus Particle ExploreR) treats crystal structures of

icosahedral viruses

bull In the field of immunologybull IMGT the international ImMunoGeneTics database is a

high-quality integrated database specializing in Immunoglobulins (Ig) T-cell receptors (TcR) and Major Histocompatibility Complex (MHC) molecules of all vertebrate species

bull The IMGT server provides a common access to all Immunogenetics data

bull At present it includes two databases IMGTLIGM-DB a comprehensive database of immunoglobulin and T-cell receptor gene sequences from human and other vertebrates with translation for fully annotated sequences and IMGTHLA-DB a database of the human MHC referred to as HLA (Human Leucocyte Antigens)

Web Resource Databases for Specific Protein Families

bull Protein kinasesbull httpwwwsdscedukinasesbull HIV proteasesbull httpwww-fbscncifcrfgovHIVdbbull Icosahedral virusesbull httpmmtsbscrippseduvipermainhtmlbull Immunologybull IGMT httpimgtcinesfrbull KABAT httpimmunobmenwuedubull MHCPEP httpwehihwehieduaumhcpepbull Collections of links to databases on specific protein familiesbull httpwww2ebiacukmsdLinksfamilyshtmlbull KABAT - Database of Sequences of Proteins of Immunological Interest -

North-Western University (USA)bull MHCPEP - Major Histocompatibility Complex Binding Peptides Database -

Walter and Eliza Hall Institute (Melbourne Australia)

Expression and proteomics databases

bull Expression databases record measurements of mRNA levels usually via ESTs (short terminal sequences of cDNA synthesized from mRNA)

Comparisons of expression patterns give clues to (1) the function and mechanism of action of gene products(2) how organisms coordinate their control over metabolic processes in

different conditions - for instance yeast under aerobic or anaerobic conditions

(3) the variations in mobilization of genes at different stages of the cell cycle or of the development of an organism (4) mechanisms of antibiotic resistance in bacteria and consequent

suggestion of targets for drug development (5) the response to challenge by a parasite (6) the response to medications of different types and dosages to

guide effective therapy

  • Archives and Information Retrieval
  • Database indexing and specification of search terms
  • Nucleic acid sequence databases
  • Nucleotide sequence databases
  • Slide 6
  • Slide 7
  • The EMBL data library entry for the bovine pancreatic trypsin inhibitor gene
  • Slide 9
  • FT The feature table may indicate regions that 1 perform or affect function 2 interact with other molecules 3 affect replication 4 are involved in recombination 5 are a repeated unit 6 have secondary or tertiary structure 7 are revised or corrected
  • Protein sequence databases
  • Slide 12
  • PIR entry for the amino acid sequence of Bovine pancreatic trypsin inhibitor
  • Slide 14
  • Slide 15
  • Databases associated with SWISS-PROT
  • A Sample Entry in ENZYME DB
  • The PIR and associated databases
  • Databases of structures
  • Slide 20
  • Protein data bank entry 2TRX E coli thioredoxin
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Indicators of structure quality
  • Crystallisation
  • Determination of protein structure
  • The resolution problem
  • Nuclear Magnetic Resonance (NMR)
  • Classifications of protein structures
  • CATH - A protein domain classification
  • Specialized or boutique databases
  • Web Resource Databases for Specific Protein Families
  • Expression and proteomics databases
Page 18: Archives and Information Retrieval Lecture by; Ms. AQSAD RASHDA BIOINFORMATICS.

Crystal

Hanging drop method vapour diffusion method

Microscope slide

2-Concentrated salt solution

1-Dilute protein solutionMicroscope

many differentconditions of 1amp2must be tried

many differentconditions of 1amp2must be tried

Crystallisation

Slide courtesy from Shoshana Wodak

Diffraction pattern Atomic model

Determination of protein structure

Slide courtesy from Shoshana Wodak

A high resolution protein structure 15 - 20 Aring resolution

The resolution problem

Slide courtesy from Shoshana Wodak

Nuclear Magnetic Resonance (NMR)

Source Branden amp Tooze (1991)

Classifications of protein structuresbull SCOP Structural Classification of Proteins

bull CATH ClassArchitectureTopologyHomology

bull DALIClassification of protein domains

Based on extraction of similar structures from distance matrices

[httpwwwebiacukdalidomain]

bull CE A database of structural alignments

Class

Architecture

Topology

Figure from Shoshana Wodak

CATH - A protein domain classification

bull In CATH protein domains are classified according to a tree with 4 levels of hierarchicallyndash Classndash Architecturendash Topologyndash Homology

Specialized or boutique databasesbull VIPER (Virus Particle ExploreR) treats crystal structures of

icosahedral viruses

bull In the field of immunologybull IMGT the international ImMunoGeneTics database is a

high-quality integrated database specializing in Immunoglobulins (Ig) T-cell receptors (TcR) and Major Histocompatibility Complex (MHC) molecules of all vertebrate species

bull The IMGT server provides a common access to all Immunogenetics data

bull At present it includes two databases IMGTLIGM-DB a comprehensive database of immunoglobulin and T-cell receptor gene sequences from human and other vertebrates with translation for fully annotated sequences and IMGTHLA-DB a database of the human MHC referred to as HLA (Human Leucocyte Antigens)

Web Resource Databases for Specific Protein Families

bull Protein kinasesbull httpwwwsdscedukinasesbull HIV proteasesbull httpwww-fbscncifcrfgovHIVdbbull Icosahedral virusesbull httpmmtsbscrippseduvipermainhtmlbull Immunologybull IGMT httpimgtcinesfrbull KABAT httpimmunobmenwuedubull MHCPEP httpwehihwehieduaumhcpepbull Collections of links to databases on specific protein familiesbull httpwww2ebiacukmsdLinksfamilyshtmlbull KABAT - Database of Sequences of Proteins of Immunological Interest -

North-Western University (USA)bull MHCPEP - Major Histocompatibility Complex Binding Peptides Database -

Walter and Eliza Hall Institute (Melbourne Australia)

Expression and proteomics databases

bull Expression databases record measurements of mRNA levels usually via ESTs (short terminal sequences of cDNA synthesized from mRNA)

Comparisons of expression patterns give clues to (1) the function and mechanism of action of gene products(2) how organisms coordinate their control over metabolic processes in

different conditions - for instance yeast under aerobic or anaerobic conditions

(3) the variations in mobilization of genes at different stages of the cell cycle or of the development of an organism (4) mechanisms of antibiotic resistance in bacteria and consequent

suggestion of targets for drug development (5) the response to challenge by a parasite (6) the response to medications of different types and dosages to

guide effective therapy

  • Archives and Information Retrieval
  • Database indexing and specification of search terms
  • Nucleic acid sequence databases
  • Nucleotide sequence databases
  • Slide 6
  • Slide 7
  • The EMBL data library entry for the bovine pancreatic trypsin inhibitor gene
  • Slide 9
  • FT The feature table may indicate regions that 1 perform or affect function 2 interact with other molecules 3 affect replication 4 are involved in recombination 5 are a repeated unit 6 have secondary or tertiary structure 7 are revised or corrected
  • Protein sequence databases
  • Slide 12
  • PIR entry for the amino acid sequence of Bovine pancreatic trypsin inhibitor
  • Slide 14
  • Slide 15
  • Databases associated with SWISS-PROT
  • A Sample Entry in ENZYME DB
  • The PIR and associated databases
  • Databases of structures
  • Slide 20
  • Protein data bank entry 2TRX E coli thioredoxin
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Indicators of structure quality
  • Crystallisation
  • Determination of protein structure
  • The resolution problem
  • Nuclear Magnetic Resonance (NMR)
  • Classifications of protein structures
  • CATH - A protein domain classification
  • Specialized or boutique databases
  • Web Resource Databases for Specific Protein Families
  • Expression and proteomics databases
Page 19: Archives and Information Retrieval Lecture by; Ms. AQSAD RASHDA BIOINFORMATICS.

Diffraction pattern Atomic model

Determination of protein structure

Slide courtesy from Shoshana Wodak

A high resolution protein structure 15 - 20 Aring resolution

The resolution problem

Slide courtesy from Shoshana Wodak

Nuclear Magnetic Resonance (NMR)

Source Branden amp Tooze (1991)

Classifications of protein structuresbull SCOP Structural Classification of Proteins

bull CATH ClassArchitectureTopologyHomology

bull DALIClassification of protein domains

Based on extraction of similar structures from distance matrices

[httpwwwebiacukdalidomain]

bull CE A database of structural alignments

Class

Architecture

Topology

Figure from Shoshana Wodak

CATH - A protein domain classification

bull In CATH protein domains are classified according to a tree with 4 levels of hierarchicallyndash Classndash Architecturendash Topologyndash Homology

Specialized or boutique databasesbull VIPER (Virus Particle ExploreR) treats crystal structures of

icosahedral viruses

bull In the field of immunologybull IMGT the international ImMunoGeneTics database is a

high-quality integrated database specializing in Immunoglobulins (Ig) T-cell receptors (TcR) and Major Histocompatibility Complex (MHC) molecules of all vertebrate species

bull The IMGT server provides a common access to all Immunogenetics data

bull At present it includes two databases IMGTLIGM-DB a comprehensive database of immunoglobulin and T-cell receptor gene sequences from human and other vertebrates with translation for fully annotated sequences and IMGTHLA-DB a database of the human MHC referred to as HLA (Human Leucocyte Antigens)

Web Resource Databases for Specific Protein Families

bull Protein kinasesbull httpwwwsdscedukinasesbull HIV proteasesbull httpwww-fbscncifcrfgovHIVdbbull Icosahedral virusesbull httpmmtsbscrippseduvipermainhtmlbull Immunologybull IGMT httpimgtcinesfrbull KABAT httpimmunobmenwuedubull MHCPEP httpwehihwehieduaumhcpepbull Collections of links to databases on specific protein familiesbull httpwww2ebiacukmsdLinksfamilyshtmlbull KABAT - Database of Sequences of Proteins of Immunological Interest -

North-Western University (USA)bull MHCPEP - Major Histocompatibility Complex Binding Peptides Database -

Walter and Eliza Hall Institute (Melbourne Australia)

Expression and proteomics databases

bull Expression databases record measurements of mRNA levels usually via ESTs (short terminal sequences of cDNA synthesized from mRNA)

Comparisons of expression patterns give clues to (1) the function and mechanism of action of gene products(2) how organisms coordinate their control over metabolic processes in

different conditions - for instance yeast under aerobic or anaerobic conditions

(3) the variations in mobilization of genes at different stages of the cell cycle or of the development of an organism (4) mechanisms of antibiotic resistance in bacteria and consequent

suggestion of targets for drug development (5) the response to challenge by a parasite (6) the response to medications of different types and dosages to

guide effective therapy

  • Archives and Information Retrieval
  • Database indexing and specification of search terms
  • Nucleic acid sequence databases
  • Nucleotide sequence databases
  • Slide 6
  • Slide 7
  • The EMBL data library entry for the bovine pancreatic trypsin inhibitor gene
  • Slide 9
  • FT The feature table may indicate regions that 1 perform or affect function 2 interact with other molecules 3 affect replication 4 are involved in recombination 5 are a repeated unit 6 have secondary or tertiary structure 7 are revised or corrected
  • Protein sequence databases
  • Slide 12
  • PIR entry for the amino acid sequence of Bovine pancreatic trypsin inhibitor
  • Slide 14
  • Slide 15
  • Databases associated with SWISS-PROT
  • A Sample Entry in ENZYME DB
  • The PIR and associated databases
  • Databases of structures
  • Slide 20
  • Protein data bank entry 2TRX E coli thioredoxin
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Indicators of structure quality
  • Crystallisation
  • Determination of protein structure
  • The resolution problem
  • Nuclear Magnetic Resonance (NMR)
  • Classifications of protein structures
  • CATH - A protein domain classification
  • Specialized or boutique databases
  • Web Resource Databases for Specific Protein Families
  • Expression and proteomics databases
Page 20: Archives and Information Retrieval Lecture by; Ms. AQSAD RASHDA BIOINFORMATICS.

A high resolution protein structure 15 - 20 Aring resolution

The resolution problem

Slide courtesy from Shoshana Wodak

Nuclear Magnetic Resonance (NMR)

Source Branden amp Tooze (1991)

Classifications of protein structuresbull SCOP Structural Classification of Proteins

bull CATH ClassArchitectureTopologyHomology

bull DALIClassification of protein domains

Based on extraction of similar structures from distance matrices

[httpwwwebiacukdalidomain]

bull CE A database of structural alignments

Class

Architecture

Topology

Figure from Shoshana Wodak

CATH - A protein domain classification

bull In CATH protein domains are classified according to a tree with 4 levels of hierarchicallyndash Classndash Architecturendash Topologyndash Homology

Specialized or boutique databasesbull VIPER (Virus Particle ExploreR) treats crystal structures of

icosahedral viruses

bull In the field of immunologybull IMGT the international ImMunoGeneTics database is a

high-quality integrated database specializing in Immunoglobulins (Ig) T-cell receptors (TcR) and Major Histocompatibility Complex (MHC) molecules of all vertebrate species

bull The IMGT server provides a common access to all Immunogenetics data

bull At present it includes two databases IMGTLIGM-DB a comprehensive database of immunoglobulin and T-cell receptor gene sequences from human and other vertebrates with translation for fully annotated sequences and IMGTHLA-DB a database of the human MHC referred to as HLA (Human Leucocyte Antigens)

Web Resource Databases for Specific Protein Families

bull Protein kinasesbull httpwwwsdscedukinasesbull HIV proteasesbull httpwww-fbscncifcrfgovHIVdbbull Icosahedral virusesbull httpmmtsbscrippseduvipermainhtmlbull Immunologybull IGMT httpimgtcinesfrbull KABAT httpimmunobmenwuedubull MHCPEP httpwehihwehieduaumhcpepbull Collections of links to databases on specific protein familiesbull httpwww2ebiacukmsdLinksfamilyshtmlbull KABAT - Database of Sequences of Proteins of Immunological Interest -

North-Western University (USA)bull MHCPEP - Major Histocompatibility Complex Binding Peptides Database -

Walter and Eliza Hall Institute (Melbourne Australia)

Expression and proteomics databases

bull Expression databases record measurements of mRNA levels usually via ESTs (short terminal sequences of cDNA synthesized from mRNA)

Comparisons of expression patterns give clues to (1) the function and mechanism of action of gene products(2) how organisms coordinate their control over metabolic processes in

different conditions - for instance yeast under aerobic or anaerobic conditions

(3) the variations in mobilization of genes at different stages of the cell cycle or of the development of an organism (4) mechanisms of antibiotic resistance in bacteria and consequent

suggestion of targets for drug development (5) the response to challenge by a parasite (6) the response to medications of different types and dosages to

guide effective therapy

  • Archives and Information Retrieval
  • Database indexing and specification of search terms
  • Nucleic acid sequence databases
  • Nucleotide sequence databases
  • Slide 6
  • Slide 7
  • The EMBL data library entry for the bovine pancreatic trypsin inhibitor gene
  • Slide 9
  • FT The feature table may indicate regions that 1 perform or affect function 2 interact with other molecules 3 affect replication 4 are involved in recombination 5 are a repeated unit 6 have secondary or tertiary structure 7 are revised or corrected
  • Protein sequence databases
  • Slide 12
  • PIR entry for the amino acid sequence of Bovine pancreatic trypsin inhibitor
  • Slide 14
  • Slide 15
  • Databases associated with SWISS-PROT
  • A Sample Entry in ENZYME DB
  • The PIR and associated databases
  • Databases of structures
  • Slide 20
  • Protein data bank entry 2TRX E coli thioredoxin
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Indicators of structure quality
  • Crystallisation
  • Determination of protein structure
  • The resolution problem
  • Nuclear Magnetic Resonance (NMR)
  • Classifications of protein structures
  • CATH - A protein domain classification
  • Specialized or boutique databases
  • Web Resource Databases for Specific Protein Families
  • Expression and proteomics databases
Page 21: Archives and Information Retrieval Lecture by; Ms. AQSAD RASHDA BIOINFORMATICS.

Nuclear Magnetic Resonance (NMR)

Source Branden amp Tooze (1991)

Classifications of protein structuresbull SCOP Structural Classification of Proteins

bull CATH ClassArchitectureTopologyHomology

bull DALIClassification of protein domains

Based on extraction of similar structures from distance matrices

[httpwwwebiacukdalidomain]

bull CE A database of structural alignments

Class

Architecture

Topology

Figure from Shoshana Wodak

CATH - A protein domain classification

bull In CATH protein domains are classified according to a tree with 4 levels of hierarchicallyndash Classndash Architecturendash Topologyndash Homology

Specialized or boutique databasesbull VIPER (Virus Particle ExploreR) treats crystal structures of

icosahedral viruses

bull In the field of immunologybull IMGT the international ImMunoGeneTics database is a

high-quality integrated database specializing in Immunoglobulins (Ig) T-cell receptors (TcR) and Major Histocompatibility Complex (MHC) molecules of all vertebrate species

bull The IMGT server provides a common access to all Immunogenetics data

bull At present it includes two databases IMGTLIGM-DB a comprehensive database of immunoglobulin and T-cell receptor gene sequences from human and other vertebrates with translation for fully annotated sequences and IMGTHLA-DB a database of the human MHC referred to as HLA (Human Leucocyte Antigens)

Web Resource Databases for Specific Protein Families

bull Protein kinasesbull httpwwwsdscedukinasesbull HIV proteasesbull httpwww-fbscncifcrfgovHIVdbbull Icosahedral virusesbull httpmmtsbscrippseduvipermainhtmlbull Immunologybull IGMT httpimgtcinesfrbull KABAT httpimmunobmenwuedubull MHCPEP httpwehihwehieduaumhcpepbull Collections of links to databases on specific protein familiesbull httpwww2ebiacukmsdLinksfamilyshtmlbull KABAT - Database of Sequences of Proteins of Immunological Interest -

North-Western University (USA)bull MHCPEP - Major Histocompatibility Complex Binding Peptides Database -

Walter and Eliza Hall Institute (Melbourne Australia)

Expression and proteomics databases

bull Expression databases record measurements of mRNA levels usually via ESTs (short terminal sequences of cDNA synthesized from mRNA)

Comparisons of expression patterns give clues to (1) the function and mechanism of action of gene products(2) how organisms coordinate their control over metabolic processes in

different conditions - for instance yeast under aerobic or anaerobic conditions

(3) the variations in mobilization of genes at different stages of the cell cycle or of the development of an organism (4) mechanisms of antibiotic resistance in bacteria and consequent

suggestion of targets for drug development (5) the response to challenge by a parasite (6) the response to medications of different types and dosages to

guide effective therapy

  • Archives and Information Retrieval
  • Database indexing and specification of search terms
  • Nucleic acid sequence databases
  • Nucleotide sequence databases
  • Slide 6
  • Slide 7
  • The EMBL data library entry for the bovine pancreatic trypsin inhibitor gene
  • Slide 9
  • FT The feature table may indicate regions that 1 perform or affect function 2 interact with other molecules 3 affect replication 4 are involved in recombination 5 are a repeated unit 6 have secondary or tertiary structure 7 are revised or corrected
  • Protein sequence databases
  • Slide 12
  • PIR entry for the amino acid sequence of Bovine pancreatic trypsin inhibitor
  • Slide 14
  • Slide 15
  • Databases associated with SWISS-PROT
  • A Sample Entry in ENZYME DB
  • The PIR and associated databases
  • Databases of structures
  • Slide 20
  • Protein data bank entry 2TRX E coli thioredoxin
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Indicators of structure quality
  • Crystallisation
  • Determination of protein structure
  • The resolution problem
  • Nuclear Magnetic Resonance (NMR)
  • Classifications of protein structures
  • CATH - A protein domain classification
  • Specialized or boutique databases
  • Web Resource Databases for Specific Protein Families
  • Expression and proteomics databases
Page 22: Archives and Information Retrieval Lecture by; Ms. AQSAD RASHDA BIOINFORMATICS.

Classifications of protein structuresbull SCOP Structural Classification of Proteins

bull CATH ClassArchitectureTopologyHomology

bull DALIClassification of protein domains

Based on extraction of similar structures from distance matrices

[httpwwwebiacukdalidomain]

bull CE A database of structural alignments

Class

Architecture

Topology

Figure from Shoshana Wodak

CATH - A protein domain classification

bull In CATH protein domains are classified according to a tree with 4 levels of hierarchicallyndash Classndash Architecturendash Topologyndash Homology

Specialized or boutique databasesbull VIPER (Virus Particle ExploreR) treats crystal structures of

icosahedral viruses

bull In the field of immunologybull IMGT the international ImMunoGeneTics database is a

high-quality integrated database specializing in Immunoglobulins (Ig) T-cell receptors (TcR) and Major Histocompatibility Complex (MHC) molecules of all vertebrate species

bull The IMGT server provides a common access to all Immunogenetics data

bull At present it includes two databases IMGTLIGM-DB a comprehensive database of immunoglobulin and T-cell receptor gene sequences from human and other vertebrates with translation for fully annotated sequences and IMGTHLA-DB a database of the human MHC referred to as HLA (Human Leucocyte Antigens)

Web Resource Databases for Specific Protein Families

bull Protein kinasesbull httpwwwsdscedukinasesbull HIV proteasesbull httpwww-fbscncifcrfgovHIVdbbull Icosahedral virusesbull httpmmtsbscrippseduvipermainhtmlbull Immunologybull IGMT httpimgtcinesfrbull KABAT httpimmunobmenwuedubull MHCPEP httpwehihwehieduaumhcpepbull Collections of links to databases on specific protein familiesbull httpwww2ebiacukmsdLinksfamilyshtmlbull KABAT - Database of Sequences of Proteins of Immunological Interest -

North-Western University (USA)bull MHCPEP - Major Histocompatibility Complex Binding Peptides Database -

Walter and Eliza Hall Institute (Melbourne Australia)

Expression and proteomics databases

bull Expression databases record measurements of mRNA levels usually via ESTs (short terminal sequences of cDNA synthesized from mRNA)

Comparisons of expression patterns give clues to (1) the function and mechanism of action of gene products(2) how organisms coordinate their control over metabolic processes in

different conditions - for instance yeast under aerobic or anaerobic conditions

(3) the variations in mobilization of genes at different stages of the cell cycle or of the development of an organism (4) mechanisms of antibiotic resistance in bacteria and consequent

suggestion of targets for drug development (5) the response to challenge by a parasite (6) the response to medications of different types and dosages to

guide effective therapy

  • Archives and Information Retrieval
  • Database indexing and specification of search terms
  • Nucleic acid sequence databases
  • Nucleotide sequence databases
  • Slide 6
  • Slide 7
  • The EMBL data library entry for the bovine pancreatic trypsin inhibitor gene
  • Slide 9
  • FT The feature table may indicate regions that 1 perform or affect function 2 interact with other molecules 3 affect replication 4 are involved in recombination 5 are a repeated unit 6 have secondary or tertiary structure 7 are revised or corrected
  • Protein sequence databases
  • Slide 12
  • PIR entry for the amino acid sequence of Bovine pancreatic trypsin inhibitor
  • Slide 14
  • Slide 15
  • Databases associated with SWISS-PROT
  • A Sample Entry in ENZYME DB
  • The PIR and associated databases
  • Databases of structures
  • Slide 20
  • Protein data bank entry 2TRX E coli thioredoxin
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Indicators of structure quality
  • Crystallisation
  • Determination of protein structure
  • The resolution problem
  • Nuclear Magnetic Resonance (NMR)
  • Classifications of protein structures
  • CATH - A protein domain classification
  • Specialized or boutique databases
  • Web Resource Databases for Specific Protein Families
  • Expression and proteomics databases
Page 23: Archives and Information Retrieval Lecture by; Ms. AQSAD RASHDA BIOINFORMATICS.

Class

Architecture

Topology

Figure from Shoshana Wodak

CATH - A protein domain classification

bull In CATH protein domains are classified according to a tree with 4 levels of hierarchicallyndash Classndash Architecturendash Topologyndash Homology

Specialized or boutique databasesbull VIPER (Virus Particle ExploreR) treats crystal structures of

icosahedral viruses

bull In the field of immunologybull IMGT the international ImMunoGeneTics database is a

high-quality integrated database specializing in Immunoglobulins (Ig) T-cell receptors (TcR) and Major Histocompatibility Complex (MHC) molecules of all vertebrate species

bull The IMGT server provides a common access to all Immunogenetics data

bull At present it includes two databases IMGTLIGM-DB a comprehensive database of immunoglobulin and T-cell receptor gene sequences from human and other vertebrates with translation for fully annotated sequences and IMGTHLA-DB a database of the human MHC referred to as HLA (Human Leucocyte Antigens)

Web Resource Databases for Specific Protein Families

bull Protein kinasesbull httpwwwsdscedukinasesbull HIV proteasesbull httpwww-fbscncifcrfgovHIVdbbull Icosahedral virusesbull httpmmtsbscrippseduvipermainhtmlbull Immunologybull IGMT httpimgtcinesfrbull KABAT httpimmunobmenwuedubull MHCPEP httpwehihwehieduaumhcpepbull Collections of links to databases on specific protein familiesbull httpwww2ebiacukmsdLinksfamilyshtmlbull KABAT - Database of Sequences of Proteins of Immunological Interest -

North-Western University (USA)bull MHCPEP - Major Histocompatibility Complex Binding Peptides Database -

Walter and Eliza Hall Institute (Melbourne Australia)

Expression and proteomics databases

bull Expression databases record measurements of mRNA levels usually via ESTs (short terminal sequences of cDNA synthesized from mRNA)

Comparisons of expression patterns give clues to (1) the function and mechanism of action of gene products(2) how organisms coordinate their control over metabolic processes in

different conditions - for instance yeast under aerobic or anaerobic conditions

(3) the variations in mobilization of genes at different stages of the cell cycle or of the development of an organism (4) mechanisms of antibiotic resistance in bacteria and consequent

suggestion of targets for drug development (5) the response to challenge by a parasite (6) the response to medications of different types and dosages to

guide effective therapy

  • Archives and Information Retrieval
  • Database indexing and specification of search terms
  • Nucleic acid sequence databases
  • Nucleotide sequence databases
  • Slide 6
  • Slide 7
  • The EMBL data library entry for the bovine pancreatic trypsin inhibitor gene
  • Slide 9
  • FT The feature table may indicate regions that 1 perform or affect function 2 interact with other molecules 3 affect replication 4 are involved in recombination 5 are a repeated unit 6 have secondary or tertiary structure 7 are revised or corrected
  • Protein sequence databases
  • Slide 12
  • PIR entry for the amino acid sequence of Bovine pancreatic trypsin inhibitor
  • Slide 14
  • Slide 15
  • Databases associated with SWISS-PROT
  • A Sample Entry in ENZYME DB
  • The PIR and associated databases
  • Databases of structures
  • Slide 20
  • Protein data bank entry 2TRX E coli thioredoxin
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Indicators of structure quality
  • Crystallisation
  • Determination of protein structure
  • The resolution problem
  • Nuclear Magnetic Resonance (NMR)
  • Classifications of protein structures
  • CATH - A protein domain classification
  • Specialized or boutique databases
  • Web Resource Databases for Specific Protein Families
  • Expression and proteomics databases
Page 24: Archives and Information Retrieval Lecture by; Ms. AQSAD RASHDA BIOINFORMATICS.

Specialized or boutique databasesbull VIPER (Virus Particle ExploreR) treats crystal structures of

icosahedral viruses

bull In the field of immunologybull IMGT the international ImMunoGeneTics database is a

high-quality integrated database specializing in Immunoglobulins (Ig) T-cell receptors (TcR) and Major Histocompatibility Complex (MHC) molecules of all vertebrate species

bull The IMGT server provides a common access to all Immunogenetics data

bull At present it includes two databases IMGTLIGM-DB a comprehensive database of immunoglobulin and T-cell receptor gene sequences from human and other vertebrates with translation for fully annotated sequences and IMGTHLA-DB a database of the human MHC referred to as HLA (Human Leucocyte Antigens)

Web Resource Databases for Specific Protein Families

bull Protein kinasesbull httpwwwsdscedukinasesbull HIV proteasesbull httpwww-fbscncifcrfgovHIVdbbull Icosahedral virusesbull httpmmtsbscrippseduvipermainhtmlbull Immunologybull IGMT httpimgtcinesfrbull KABAT httpimmunobmenwuedubull MHCPEP httpwehihwehieduaumhcpepbull Collections of links to databases on specific protein familiesbull httpwww2ebiacukmsdLinksfamilyshtmlbull KABAT - Database of Sequences of Proteins of Immunological Interest -

North-Western University (USA)bull MHCPEP - Major Histocompatibility Complex Binding Peptides Database -

Walter and Eliza Hall Institute (Melbourne Australia)

Expression and proteomics databases

bull Expression databases record measurements of mRNA levels usually via ESTs (short terminal sequences of cDNA synthesized from mRNA)

Comparisons of expression patterns give clues to (1) the function and mechanism of action of gene products(2) how organisms coordinate their control over metabolic processes in

different conditions - for instance yeast under aerobic or anaerobic conditions

(3) the variations in mobilization of genes at different stages of the cell cycle or of the development of an organism (4) mechanisms of antibiotic resistance in bacteria and consequent

suggestion of targets for drug development (5) the response to challenge by a parasite (6) the response to medications of different types and dosages to

guide effective therapy

  • Archives and Information Retrieval
  • Database indexing and specification of search terms
  • Nucleic acid sequence databases
  • Nucleotide sequence databases
  • Slide 6
  • Slide 7
  • The EMBL data library entry for the bovine pancreatic trypsin inhibitor gene
  • Slide 9
  • FT The feature table may indicate regions that 1 perform or affect function 2 interact with other molecules 3 affect replication 4 are involved in recombination 5 are a repeated unit 6 have secondary or tertiary structure 7 are revised or corrected
  • Protein sequence databases
  • Slide 12
  • PIR entry for the amino acid sequence of Bovine pancreatic trypsin inhibitor
  • Slide 14
  • Slide 15
  • Databases associated with SWISS-PROT
  • A Sample Entry in ENZYME DB
  • The PIR and associated databases
  • Databases of structures
  • Slide 20
  • Protein data bank entry 2TRX E coli thioredoxin
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Indicators of structure quality
  • Crystallisation
  • Determination of protein structure
  • The resolution problem
  • Nuclear Magnetic Resonance (NMR)
  • Classifications of protein structures
  • CATH - A protein domain classification
  • Specialized or boutique databases
  • Web Resource Databases for Specific Protein Families
  • Expression and proteomics databases
Page 25: Archives and Information Retrieval Lecture by; Ms. AQSAD RASHDA BIOINFORMATICS.

Web Resource Databases for Specific Protein Families

bull Protein kinasesbull httpwwwsdscedukinasesbull HIV proteasesbull httpwww-fbscncifcrfgovHIVdbbull Icosahedral virusesbull httpmmtsbscrippseduvipermainhtmlbull Immunologybull IGMT httpimgtcinesfrbull KABAT httpimmunobmenwuedubull MHCPEP httpwehihwehieduaumhcpepbull Collections of links to databases on specific protein familiesbull httpwww2ebiacukmsdLinksfamilyshtmlbull KABAT - Database of Sequences of Proteins of Immunological Interest -

North-Western University (USA)bull MHCPEP - Major Histocompatibility Complex Binding Peptides Database -

Walter and Eliza Hall Institute (Melbourne Australia)

Expression and proteomics databases

bull Expression databases record measurements of mRNA levels usually via ESTs (short terminal sequences of cDNA synthesized from mRNA)

Comparisons of expression patterns give clues to (1) the function and mechanism of action of gene products(2) how organisms coordinate their control over metabolic processes in

different conditions - for instance yeast under aerobic or anaerobic conditions

(3) the variations in mobilization of genes at different stages of the cell cycle or of the development of an organism (4) mechanisms of antibiotic resistance in bacteria and consequent

suggestion of targets for drug development (5) the response to challenge by a parasite (6) the response to medications of different types and dosages to

guide effective therapy

  • Archives and Information Retrieval
  • Database indexing and specification of search terms
  • Nucleic acid sequence databases
  • Nucleotide sequence databases
  • Slide 6
  • Slide 7
  • The EMBL data library entry for the bovine pancreatic trypsin inhibitor gene
  • Slide 9
  • FT The feature table may indicate regions that 1 perform or affect function 2 interact with other molecules 3 affect replication 4 are involved in recombination 5 are a repeated unit 6 have secondary or tertiary structure 7 are revised or corrected
  • Protein sequence databases
  • Slide 12
  • PIR entry for the amino acid sequence of Bovine pancreatic trypsin inhibitor
  • Slide 14
  • Slide 15
  • Databases associated with SWISS-PROT
  • A Sample Entry in ENZYME DB
  • The PIR and associated databases
  • Databases of structures
  • Slide 20
  • Protein data bank entry 2TRX E coli thioredoxin
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Indicators of structure quality
  • Crystallisation
  • Determination of protein structure
  • The resolution problem
  • Nuclear Magnetic Resonance (NMR)
  • Classifications of protein structures
  • CATH - A protein domain classification
  • Specialized or boutique databases
  • Web Resource Databases for Specific Protein Families
  • Expression and proteomics databases
Page 26: Archives and Information Retrieval Lecture by; Ms. AQSAD RASHDA BIOINFORMATICS.

Expression and proteomics databases

bull Expression databases record measurements of mRNA levels usually via ESTs (short terminal sequences of cDNA synthesized from mRNA)

Comparisons of expression patterns give clues to (1) the function and mechanism of action of gene products(2) how organisms coordinate their control over metabolic processes in

different conditions - for instance yeast under aerobic or anaerobic conditions

(3) the variations in mobilization of genes at different stages of the cell cycle or of the development of an organism (4) mechanisms of antibiotic resistance in bacteria and consequent

suggestion of targets for drug development (5) the response to challenge by a parasite (6) the response to medications of different types and dosages to

guide effective therapy

  • Archives and Information Retrieval
  • Database indexing and specification of search terms
  • Nucleic acid sequence databases
  • Nucleotide sequence databases
  • Slide 6
  • Slide 7
  • The EMBL data library entry for the bovine pancreatic trypsin inhibitor gene
  • Slide 9
  • FT The feature table may indicate regions that 1 perform or affect function 2 interact with other molecules 3 affect replication 4 are involved in recombination 5 are a repeated unit 6 have secondary or tertiary structure 7 are revised or corrected
  • Protein sequence databases
  • Slide 12
  • PIR entry for the amino acid sequence of Bovine pancreatic trypsin inhibitor
  • Slide 14
  • Slide 15
  • Databases associated with SWISS-PROT
  • A Sample Entry in ENZYME DB
  • The PIR and associated databases
  • Databases of structures
  • Slide 20
  • Protein data bank entry 2TRX E coli thioredoxin
  • Slide 22
  • Slide 23
  • Slide 24
  • Slide 25
  • Slide 26
  • Indicators of structure quality
  • Crystallisation
  • Determination of protein structure
  • The resolution problem
  • Nuclear Magnetic Resonance (NMR)
  • Classifications of protein structures
  • CATH - A protein domain classification
  • Specialized or boutique databases
  • Web Resource Databases for Specific Protein Families
  • Expression and proteomics databases