BIO-TRAC 25 (Proteomics: Principles and BIO-TRAC 25 (Proteomics: Principles and Methods) Methods) March 28, 2003 March 28, 2003 NIH, Bethesda, MD NIH, Bethesda, MD Zhang-Zhi Hu, M.D. Zhang-Zhi Hu, M.D. Bioinformatics Scientist, Protein Bioinformatics Scientist, Protein Information Resource Information Resource National Biomedical Research Foundation National Biomedical Research Foundation Tutorial: Tutorial: Bioinformatics Resources Bioinformatics Resources
50
Embed
BIO-TRAC 25 (Proteomics: Principles and Methods) March 28, 2003 NIH, Bethesda, MD Zhang-Zhi Hu, M.D. Bioinformatics Scientist, Protein Information Resource.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
BIO-TRAC 25 (Proteomics: Principles and Methods)BIO-TRAC 25 (Proteomics: Principles and Methods)March 28, 2003March 28, 2003 NIH, Bethesda, MDNIH, Bethesda, MD
Zhang-Zhi Hu, M.D. Zhang-Zhi Hu, M.D. Bioinformatics Scientist, Protein Information ResourceBioinformatics Scientist, Protein Information ResourceNational Biomedical Research FoundationNational Biomedical Research Foundation
NIH Biomedical Information Science and Technology NIH Biomedical Information Science and Technology Initiative (BISTI) Working Definition (2002)Initiative (BISTI) Working Definition (2002) - Research, - Research, development, or application of computational tools and development, or application of computational tools and approaches for expanding the use of biological, medical, approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.organize, archive, analyze, or visualize such data.
BioinformaticsBioinformatics is the application of information technology to is the application of information technology to the analysis, organization and distribution of biological data the analysis, organization and distribution of biological data in order to answer complex biological questions.in order to answer complex biological questions.
3
Bioinformatics ResourcesBioinformatics Resources
The Molecular Biology Database Collection: The Molecular Biology Database Collection: An Online An Online Compilation of Relevant Database ResourcesCompilation of Relevant Database Resources 2003 update: 2003 update: http://www3.oup.co.uk/nar/database/ Nucleic Acids Research Database Issues (January Annually) Nucleic Acids Research Database Issues (January Annually)
II. Family Classification MethodsII. Family Classification Methods
Multiple Sequence AlignmentMultiple Sequence Alignment and Phylogenetic Analysis and Phylogenetic Analysis ClustalW Multiple Sequence AlignmentClustalW Multiple Sequence Alignment Alignment Editor & Phylogenetic TreesAlignment Editor & Phylogenetic Trees
Based on Based on Family InformationFamily Information PROSITE Pattern SearchPROSITE Pattern Search Motif and Profile SearchMotif and Profile Search Hidden Markov Model (HMMs)Hidden Markov Model (HMMs)
IV. Protein Family DatabasesIV. Protein Family Databases
Whole Proteins PIR: Superfamilies and Families COG (Clusters of Orthologous Groups) of Complete Genomes ProtoNet: Automated Hierarchical Classification of Proteins
Protein Domains Pfam: Alignments and HMM Models of Protein Domains SMART: Protein Domain Families
Protein Motifs PROSITE: Protein Patterns and Profiles BLOCKS: Protein Sequence Motifs and Alignments PRINTS: Protein Sequence Motifs and Signatures
Protein MotifsProtein Motifs PROSITE is a database of protein families and domains. It
consists of biologically significant sites, patterns and profiles. (http://www.expasy.ch/prosite/)
34
Integrated Family ClassificationIntegrated Family ClassificationInterPro: An integrated resource unifying PROSITE, PRINTS, ProDom, Pfam, SMART, and TIGRFAMs. (http://www.ebi.ac.uk/interpro/search.html)
35
V. Databases of Protein FunctionsV. Databases of Protein Functions
Metabolic Pathways, Enzymes, and Compounds Enzyme Classification: Classification and Nomenclature of Enzyme-Catalysed
Reactions (EC-IUBMB) KEGG (Kyoto Encyclopedia of Genes and Genomes): Metabolic Pathways LIGAND (at KEGG): Chemical Compounds, Reactions and Enzymes EcoCyc: Encyclopedia of E. coli Genes and Metabolism MetaCyc: Metabolic Encyclopedia (Metabolic Pathways) WIT: Functional Curation and Metabolic Models BRENDA: Enzyme Database UM-BBD: Microbial Biocatalytic Reactions and Biodegradation Pathways Klotho: Collection and Categorization of Biological Compounds
Cellular Regulation and Gene Networks EpoDB: Genes Expressed during Human Erythropoiesis BIND: Descriptions of interactions, molecular complexes and pathways DIP: Catalogs experimentally determined interactions between proteins RegulonDB: Escherichia coli Pathways and Regulation
KEGG is a suite of databases and associated software, integrating our current knowledge on molecular interaction networks, the information of genes and proteins, and of chemical compounds and reactions. (http://www.genome.ad.jp/kegg/kegg2.html)
VI. Databases of Protein StructuresVI. Databases of Protein Structures
Protein Structure and Classification PDB: Structure Determined by X-ray Crystallography and NMR CATH: Hierarchical Classification of Protein Domain Structures SCOP: Familial and Structural Protein Relationships FSSP: Protein Fold Family Database
Protein Sequence-Structure Relationship PIR-NRL3D: Protein Sequence-Structure Database PIR-RESID: Protein Structure/Post-Translational Modifications HSSP: Families and Alignments of Structurally-Conserved
Protein Structural ClassificationProtein Structural ClassificationCATH: Hierarchical domain classification of protein structures (http://www.biochem.ucl.ac.uk/bsm/cath_new/)
Protein Structural ClassificationProtein Structural Classification
(http://scop.mrc-lmb. cam.ac.uk/scop/)
The SCOP database aims to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known, including all entries in the PDB.
GELBANK (GELBANK (http://gelbank.anl.gov): 2D-gel patterns from completed ): 2D-gel patterns from completed genomes; SWISS-2DPAGE (http://www.expasy.org/ch2d/)genomes; SWISS-2DPAGE (http://www.expasy.org/ch2d/)
PEP: Predictions for Entire Proteomes: (http://cubic.bioc.columbia.edu/ PEP: Predictions for Entire Proteomes: (http://cubic.bioc.columbia.edu/ pep/): Summarized analyses of protein sequencespep/): Summarized analyses of protein sequences Proteome BioKnowledge Library: (http://www.proteome.com): Detailed Proteome BioKnowledge Library: (http://www.proteome.com): Detailed information on human, mouse and rat proteomesinformation on human, mouse and rat proteomesProteome Analysis Database (http://www.ebi.ac.uk/proteome/): Online Proteome Analysis Database (http://www.ebi.ac.uk/proteome/): Online application of InterPro and CluSTr for the functional classification of application of InterPro and CluSTr for the functional classification of proteins in whole genomesproteins in whole genomesExpression Profiling databases: GNF Expression Profiling databases: GNF (http://expression.gnf.org/cgi-bin/index.cgi, human and mouse (http://expression.gnf.org/cgi-bin/index.cgi, human and mouse transcriptome), SMD transcriptome), SMD (http://genome-www5.stanford.edu/MicroArray/SMD/, Stanford (http://genome-www5.stanford.edu/MicroArray/SMD/, Stanford microarray data analysis), EBI Microarray Informatics microarray data analysis), EBI Microarray Informatics (http://www.ebi.ac.uk/microarray/ index.html , (http://www.ebi.ac.uk/microarray/ index.html , managing, storing and managing, storing and analyzing microarray dataanalyzing microarray data))
VII. 2D-Gel Image DatabasesVII. 2D-Gel Image Databases
(http://www-lecb.ncifcrf.gov/2dwgDB)
(http://gelbank.anl.gov/2dgels/index.asp)(2D-gel of human ventricle proteins)
48
VIII. Proteome AnalysisVIII. Proteome Analysis(http://www.ebi.ac.uk/proteome)
49
Expression ProfilingExpression Profiling Human and Mouse Transcriptome
(http://expression.gnf.org/cgi-bin/index.cgi)
(http://genome-www. stanford.edu/serum/)
50
Lab:Lab: Visit selected websites and analyze some protein sequence of
your own choices. List of Bioinformatics Resources of this tutorial available: http://pir.georgetown.edu/~huz/bioinfo_resource.html
Try some of the following sequences for analysis: 1) well characterized proteins: PIR:A26366(CYP17), JS0747(Sp1) 2) less characterized proteins: PIR:A59000(MATER) TrEMBL:Q9QY16(GRTH) 3) hypothetical protein: PIR:T12515, T00338 , T47130 SWISS-PROT:Q9BWT7