Top Banner
Other biological databases
43

Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

Dec 18, 2015

Download

Documents

Osborne Preston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

Other biological databases

Page 2: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

Biological systems

Taxonomic data

Literature

Protein folding and 3D structure

Small molecules

Pathways and networks

Biological systems

Protein families and domains

Whole genome data

Sequence data

Ontologies -GO

Page 3: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

Other Biological Databases

• Transcription factor binding sites -TRANSFAC

• Protein structure databases- PDB, SCOP, CATH

• Protein family databases- Pfam, Prints, PROSITE etc.

• Chemicals and small molecules -ChEBI

• Gene expression databases –GEO, ArrayExpress

• Metabolic pathways - Reactome, KEGG

• Genome Databases- Ensembl, FlyBase, WormBase etc.

• Human genetics-related databases –HapMap, dbSNP

Page 4: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

Transcription factor binding sites

• TRANSFAC –database of eukaryotic transcription factors: http://www.gene-regulation.com/pub/databases.html#transfac

• TESS –Transcription Element Search System –for predicting transcription factor binding sites, uses TRANSFAC: http://www.cbi.upenn.edu/tess

• TFsearch –for searching transcription factor binding sites: http://www.cbrc.jp/research/db/TFSEARCH.html

Page 5: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

Protein structure databases

• Main resource is Protein Data Bank (PDB): http://www.rcsb.org/pdb/

• Contains the spatial coordinates of macromolecule atoms whose 3D structure has been obtained by X-ray or NMR studies

• Proteins represent more than 90% of available structures (others are DNA, RNA, sugars, viruses, protein/DNA complexes…)

• Can search by PDB code

Page 6: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

Searching MSD

http://www.ebi.ac.uk/msd -Search by PDB code

Page 7: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

Protein structure-related databases

• Structural family databases based on PDB –SCOP (http://scop.mrc-lmb.cam.ac.uk/scop/) and CATH (http://www.biochem.ucl.ac.uk/bsm/cath/)

• Predicted structures in SWISS-MODEL (http://swissmodel.expasy.org//SWISS-MODEL.html)

Page 8: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

Protein family databases

• Databases that produce signatures for identifying protein families or domains

• Used for functional classification of proteins

• E.g. Pfam, PROSITE, Prints, SMART, TIGRFAMs etc.

• Integrated into single resource InterPro (http://www.ebi.ac.uk/interpro)

Page 9: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

InterProScan sequence search

Stand-alone version available

Page 10: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

InterPro text search

Search keyword, protein acc or InterPro acc

Page 11: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

Results for

protein acc

Page 12: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

Example InterPro

entry

Page 13: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

Chemicals and small molecules

• Chemical abstracts- http://www.cas.org/• ChEBI- http://www.ebi.ac.uk/chebi• KEGG –part of it includes chemicals

http://www.genome.jp/kegg • ChemID plus -chemicals cited in NLM databases

http://chem2.sis.nlm.nih.gov/chemidplus/chemidlite.jsp

• MSD-Chem –ligands and chemicals in MSD

Page 14: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

CheBI example entry

Page 15: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

Hierarchy for

chemicals

Page 16: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

Gene expression databases

• NCBI Gene Expression Omnibus (GEO) http://www.ncbi.nlm.nih.gov/geo/

• ArrayExpress http://www.ncbi.nlm.nih.gov/geo/

• Stanford microarray database http://genome-www5.stanford.edu/

• Can usually search for experiments or particular expression profiles

Page 17: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

GEO search page

Page 18: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

Profiles search results

Page 19: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

Specific entry and experiment info

Page 20: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

ArrayExpress search results

Page 21: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

What does the data look like?

• Info on experiment, array used, etc.

• Raw or processed tab delimited file containing spots and their intensities cy3/cy5 ratios) across different samples

• Files with meta data e.g. sample info, annotation and coordinates of each spot on array

Page 22: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

Proteomics: SWISS-2DPAGE

Page 23: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

Enzymes and metabolic pathways

• Contain information describing enzymes, biochemical reactions and metabolic pathways;

• ENZYME and BRENDA: nomenclature databases that store information on enzyme names and reactions;

• IntEnz: Integrated relational Enzyme database

Page 24: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

Enzyme nomenclature• E.C. (Enzyme Commission) numbers assigned based

on reactions they catalyze

• Hierarchy, high level groups:– EC 1 –Oxidoreductases– EC 2 –Transferases– EC 3 –Hydrolases– EC 4 –Lyases– EC 5 –Isomerases– EC 6 –Ligases

Page 25: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

EC example

Page 26: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

Metabolic Pathway databases• PATHGUIDE >200 pathways• KEGG (Kyoto encyclopedia of genes and genomes):

http://www.genome.jp/kegg -includes:– Database of chemicals, genes and networks (metabolic,

regulatory etc.)– Well-curated and quite specific

• EcoCyc (Encyclopedia of E. coli K12 genes and metabolism): http://ecocyc.org –curation of entries genome

• Reactome –curated biological pathways: http://www.reactome.org/

• GenMAPP –pathways contributed by users

Page 27: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

http://www.genome.ad.jp/kegg

Different pathway in different species: -> comparison

Page 28: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

Pathway in Reactome

Page 29: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

Example of a pathway in BioCyc

Page 30: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

Protein-protein interaction databases

• Protein-protein interaction databases store pairwise interactions or complexes

• Can get 1 to more than 20,000 interactions per publication• IntAct http://www.ebi.ac.uk/intact • DIP (Database of Interacting Proteins) http://dip.doe-

mbi.ucla.edu/• BIND (Biomolecular Interaction Network Database)

http://submit.bind.ca:8080/bind/

Page 31: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

Protein-protein interactions in IntAct

Page 32: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

Integrated functional interactions in STRING

Page 33: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

Genome browsers

• Integrate sequence & functional data for a genome• Ensembl –genome browser for major eukaryotic genomes,

e.g. human, mouse etc. http://www.ensembl.org• UCSC browser -http://genome.ucsc.edu/ • FlyBase –Drosophila genome database:

http://www.ebi.ac.uk/flybase• WormBase –C. elegans: http://www.wormbase.org• PlasmoDB –Plasmodium (malaria): http://plasmodb.org• Etc.

Page 34: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

Ensembl genome browser

Page 35: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

Ensembl gene view 1

Page 36: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

Ensembl gene view 2

Page 37: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

Gene within context on chromosome

Page 38: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

Human genetics databases

• GeneCards (http://www.genecards.org/)

• HapMap (http://hapmap.ncbi.nlm.nih.gov/)

• OMIM http://www.ncbi.nlm.nih.gov/omim

• HGDP Human Genome Diversity Project (http://hagsc.org/hgdp/files.html)

Page 39: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

Most of the databases are disease or gene centric i.e. p53

Mutation/polymorphism databases

Page 40: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

dbSNPhttp://www.ncbi.nlm.nih.gov/SNP/

Repository of all known mutation (human and other organisms)

Page 41: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

Where to find the databases

• Table of addresses for major databases and tools

• Nucleic Acids Research Database issue January each year

• Nucleic Acids Research Software issue –new

• Expasy list of tools: http://ca.expasy.org/links.html

Page 42: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

Large scale data retrieval

• Programmatic access to many databases

• MySQL access to some

• BioMart access –public and private

• FTP sites –large data downloads

Page 43: Other biological databases. Biological systems Taxonomic data Literature Protein folding and 3D structure Small molecules Pathways and networks Biological.

Other tutorials

• http://www.ensembl.org/info/website/tutorials/index.html

• http://www.ebi.ac.uk/training/online/

• http://www.ebi.ac.uk/2can/home.html