Bioinformatics 93 Bioinformatics 93 Lecture 1 – Introduction to Lecture 1 – Introduction to Bioinformtics Bioinformtics etrus Tang, Ph.D. ( 鄧鄧鄧 ) raduate Institute of Basic Medical Sciences nd ioinformatics Center, Chang Gung University. [email protected]XT: 5136 19 th September 2003 http://pastime.cgu.edu.tw/petang/index.h 鄧鄧: 鄧鄧鄧 ( 鄧鄧 5122) 鄧鄧鄧 ( 鄧鄧 5690)
51
Embed
Bioinformatics 93 Lecture 1 – Introduction to Bioinformtics Petrus Tang, Ph.D. ( 鄧致剛 ) Graduate Institute of Basic Medical Sciences and Bioinformatics.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Petrus Tang, Ph.D. (鄧致剛 )Graduate Institute of Basic Medical SciencesandBioinformatics Center, Chang Gung [email protected]: 5136
19th September 2003
http://pastime.cgu.edu.tw/petang/index.htm
助教: 葉元鳴 (分機 5122)
方怡凱 (分機 5690)
432 pages (2001) Wiley-Liss; ISBN: 0471383910
Contents
Bioinformatics and the Internet The NCBI Data Model The GenBank Sequence Database Structure Databases Genomic Mapping and Mapping Databases Information Retrieval from Biological Databases Sequence Alignment and Database Searches Multiple Sequence Alignment Predictive Methods using DNA Sequences Predictive Methods using Protein Sequences Expressed Sequence Tags Sequence Assembly and Finishing Methods Phylogenetic Analysis Comparative Genome Analysis Using Perl to Facilitate Biological Analysis
Bioinformatics: A Practical Guide to the Analysis of Bioinformatics: A Practical Guide to the Analysis of Genes & ProteinsGenes & Proteins
The answer to this question depends on whether you are talking to A computer scientist who 'does' biology, or A molecular biologist who 'does' computing.
Bioinformatics is the application of computer technology to the management of biological information. Computers are used to gather, store, analyze and integrate biological and genetic information which can then be applied to gene-based drug discovery and development.
Genetic Sequence Data Bank Dec 15 2003, Release 139.0
30,968,418 loci, 36,553,368,485 bases, from 30,968,418 reported sequences
Homo sapiens: 7,152,768 sequences 10,307,972,332 bases
EST: 5,399,835 sequences
Recent years have seen an explosive growth in biological data. Large sequencing projects are producing increasing quantities of nucleotide sequences. The contents of nucleotide databases are doubling in size approximately every 14 months. The latest release of GenBank (V.139) exceeded two billion base pairs. Not only the size of sequence data is rapidly increasing, but also the number of characterized genes from many organisms and protein structures doubles about every two years. To cope with this great quantity of data, a new scientific discipline has emerged: bioinformatics, biocomputing or computational biology
EMBL: European Molecular Biology LaboratoryEBI: European Bioinformatics Institute
DDBJ: DNA Data Bank of JapanCIB: Center for Information Biology and DNA Data Bank of JapanNIG: National Institute of Genetics
NCBI: National Center for Biotechnology InformationNLM: National Library of Medicine
Protein DatabasesProtein Databases
In 1988, The Protein Information Resource (PIR), established a cooperative effort with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID) , produces the PIR-International . Protein Sequence Database (PIR-PSD) -- a comprehensive, non-redundant, expertly annotated, fully classified and extensively cross-referenced protein sequence database in the public domain. The PIR-PSD, PIR-NREF, iProClass and other PIR auxiliary databases provide an integration of sequences, functional, and structural information to support genomics and proteomics researchThe PIR-PSD, Current Release 71.04, March 01, 2002, Contains 283153 Entries
http://pir.georgetown.edu/Protein Information Resources (PIR)Protein Information Resources (PIR)
The SWISS-PROT Protein Knowledgebase is an annotated protein sequence database established in 1986. It is maintained collaboratively by the Swiss Institute for Bioinformatics (SIB) and the European Bioinformatics Institute (EBI).
http://tw.expasy.orgExPASY Molecular Biology ServerExPASY Molecular Biology Server
The ExPASy (Expert Protein Analysis System) proteomics server of the Swiss Institute of Bioinformatics (SIB) is dedicated to the analysis of protein sequences and structures as well as 2-D PAGE
http://www.rcsb.orgProtein Data BankProtein Data BankThe Protein Data Bank (PDB) is operated by Rutgers, The State University of New Jersey; the San Diego Supercomputer Center at the University of California, San Diego; and the National Institute of Standards and Technology -- three members of the Research Collaboratory for Structural Bioinformatics (RCSB). The PDB is supported by funds from the National Science Foundation, the Department of Energy, and two units of the National Institutes of Health: the National Institute of General Medical Sciences and the National Library of Medicine.
Chang Gung Bioinformatics Center (CGBC)Chang Gung Bioinformatics Center (CGBC)
Sun StorEdge T3 FC-AL Disk Array( Append 1152GB Capacity )•2 x 9 x 18GB FC-AL HDD•2 x 9 x 36GB FC-AL HDD•2 x 9 x 72GB FC-AL HDD•RAID 5 formatted Capacity : 2016GB•6 x Fibre Channel RAID Controller•6 x 258MB Cache Memory
Local Area Network ( Fast Ethernet Switch )
Sun Fire 6800 Server ( Append 8 CPUs 8 GB Memory )•Four System Domain•4 x system board•16 x 750MHz Ultra SPARC III CPU•32GB Memory•6 x Power Supply, 4 x Fan tray•2 x System Controller•4 x PCI Assembly with 8 slots•4 x 10/100 FastEthernet Interface•4 Media tray (DVD, DDS-4 tape driver)•4 x2x18GB HDD ( OS Mirror protect)•4 x PCI-Bus FC-AL Network Adapter
Sun L20 Tape Library•1 x DLT8000, 20 slots•Backup Capacity : 800/1600GB•Solstice Backup software
SAN Switch
Fibre Channelto SCSI Router
PC P4•1.5GHz P4 CPU•2GB RAM, 80GB IDE HDD
x 20 x6
Sun Blade 1000•750MHz Ultra SPARC III CPU•512MB RAM, 18GB SCSI HDD
長庚生物資訊中心核心硬體架構長庚生物資訊中心核心硬體架構
Steps to Identify a GeneSteps to Identify a Gene Gene-Search Protein-Search Annotation
1 9210 20 30 40 50 60 70 80(1)----------MKVGERIGGGSYGNIFYAYNTANKKELALKIESEKTKRSQIFNEYRALKCLAGY----------VGIPKVYFETCYGNQNAFTranslation of 01B1(final) (1)--MEEICGGEYQIIKKIGQGSFGKIYIIKQVKTGLLFAAKLENSDAPIPQLLFESRLYQIMSGS----------TNVPRLHAHSFDSRYNTITranslation of CK1-1_full (1)---MRKIYGNYITQKRLGSGSFGEVWEAVSHSTGQKVALKLEPRNSSVPQLFFEAKLYSMFQASKSTNNSVEPCNNIPVVYATGQTETTNYMTranslation of CK1-2_full (1)--MEIRVANKYALGKKLGSGSFGDIYVAKDIVTMEEFAVKLESTRSKHPQLLYESKLYKILGGG----------IGVPKVYWYGIEGDFTIMTranslation of CK1(Plasmodium falciparum ) (1)MALDLRIGNKYRIGRKIGSGSFGDIYLGTNVVSGEEVAIKLESTRAKHPQLEYEYRVYRILSGG----------VGIPFVRWFGVECDYNAMTranslation of CK1(Schizosaccharomyces pombe) (1)--MELRVGNKYRLGRKIGSGSFGDIYLGANIASGEEVAIKLECVKTKHPQLHIESKFYKMMQGG----------VGIPSIKWCGAEGDYNVMTranslation of CK1(Homo sapiens ) (1)--MELRVGNKYRLGRKIGSGSFGDIYLGANIASGEEVAIKLECVKTKHPQLHIESKFYKMMQGG----------VGIPSIKWCGAEGDYNVMTranslation of CK1(Mus musculus ) (1)--MNLMIANRYCISQKIGAGSFGEIFRGTNMQTGETVAIKLEQAKTRHPQLAFEARFYRILNAGGGV-------VGIPNILFYGVEGEFNVMTranslation of CK1.1(Trypansoma cruzi) (1)MSLELRVGNRFRLGQKIGAGSFGEIFRGTNIQTGETVAIKLEQAKTRHPQLALEARFYRILNAGGGV-------VGIPNILFYGVEGEFNVMTranslation of CK1.2(Trypansoma cruzi ) (1) MELRVGNKYRLGKKIGSGSFGDIYLG NI TGEEVAIKLE KTKHPQL FESR YKILQGG VGIP I W G EGDYNVMConsensus (1)
93 184100 110 120 130 140 150 160 170(93)TMELLGDSLEKLFERCGRKFSLKTVLMLADQMIKCVQYIHTKSFIHRDIKPENFTIGTTranslation of 01B1(final) (73)VIDLLGKSLEEHLNKVNRRMSLKTVLMLVDQMITAVEFFHSKNYIHRDIKPDNFVMGVTranslation of CK1-1_full (81)AMELLGKSLEDLVSSVP-RFSQKTILMLAGQMISCVEFVHKHNFIHRDIKPDNFAMGVTranslation of CK1-2_full (90)VLDLLGPSLEDLFTLCNRKFSLKTVRMTADQMLNRIEYVHSKNFIHRDIKPDNFLIGRTranslation of CK1(Plasmodium falciparum ) (81)VMDLLGPSLEDLFNFCNRKFSLKTVLLLADQLISRIEFIHSKSFLHRDIKPDNFLMGITranslation of CK1(Schizosaccharomyces pombe) (83)VMELLGPSLEDLFNFCSRKFSLKTVLLLADQMISRIEYIHSKNFIHRDVKPDNFLMGLTranslation of CK1(Homo sapiens ) (81)VMELLGPSLEDLFNFCSRKFSLKTVLLLADQMISRIEYIHSKNFIHRDVKPDNFLMGLTranslation of CK1(Mus musculus ) (81)VMDLLGPSLEDLFSFCGRKLSLKTTLMLAEQMIARIEFVHSKSVIHRDMKPDNFLMGTTranslation of CK1.1(Trypansoma cruzi) (84)VMDLLGPSLEDLFSFCDRKLSLKTTLMLAEQMIARIEFVHSKSVIHRDMKPDNFLMGTTranslation of CK1.2(Trypansoma cruzi ) (86)VMDLLGPSLEDLF FC RKFSLKTVLMLADQMISRIEFIHSKNFIHRDIKPDNFLMGLConsensus (93)
151 242160 170 180 190 200 210 220 230(151)GPNSNVIYIIDFGLAKRYINGQTLTHIPYREGRSFTGTTRYGSINDHLDIEQSRRDDMESLAYTLIYFLKGFLPWHGCKRETFQ--------Translation of 01B1(final)(131)NQNSNKLYIIDYGLAKKYRDVNTHEHIPYIEGKSLTGTARYASINALLGCEQSRRDDMEAIGYVIVYLLKGHLPWMGIDGATNQERYRRIAETranslation of CK1-1_full(139)SENSNKIYIIDFGLSKKYIDQ-NNRHIRNCTGKSLTGTARYSSINALEGKEQSIRDDMESLVYVWVYLLHGRLPWMSLPTTGRK-KYEAILMTranslation of CK1-2_full(147)GKKVTLIHIIDFGLAKKYRDSRSHTSYPYKEGKNLTGTARYASINTHLGIEQSRRDDIEALGYVLMYFLRGSLPWQGLKAISKKDKYDKIMETranslation of CK1(Plasmodium falciparum )(139)GKRGNQVNIIDFGLAKKYRDHKTHLHIPYRENKNLTGTARYASINTHLGIEQSRRDDLESLGYVLVYFCRGSLPWQGLKATTKKQKYEKIMETranslation of CK1(Schizosaccharomyces pombe)(141)GKKGNLVYIIDFGLAKKYRDARTHQHIPYRENKNLTGTARYASINTHLGIEQSRRDDLESLGYVLMYFNLGSLPWQGLKAATKRQKYERISETranslation of CK1(Homo sapiens )(139)GKKGNLVYIIDFGLAKKYRDARTHQHIPYRENKNLTGTARYASINTHLGIEQSRRDDLESLGYVLMYFNLGSLPWQGLKAATKRQKYERISETranslation of CK1(Mus musculus )(139)GKKGHHVYVVDFGLAKKYRDPRTHQHIPYKEGKSLTGTARYCSINTHLGIEQSRRDDLEGIGYILMYFLRGSLPWQGLPAATKQEKYVAIAKTranslation of CK1.1(Trypansoma cruzi)(142)GKKGHHVYVVDFGLAKKYRDPRTHQHIPYKEGKSLTGTARYCSINTHLGIEQSRRDDLEGIGYILMYFLRGSLPWQGLKAHTKQEKYSRISETranslation of CK1.2(Trypansoma cruzi )(144)GKKGN VYIIDFGLAKKYRD RTH HIPYREGKSLTGTARYASINTHLGIEQSRRDDLESLGYVLMYFLRGSLPWQGLKA TKK KYERISEConsensus(151)
243 334250 260 270 280 290 300 310 320(243)IKLSTSVEELCEGLPVEFSIFLQDMRKLDFEEEPNYSKYLQLFRSLFLNSGFVYDDVYDTranslation of 01B1(final)(215)CKRDTPLEKLCEGLPSEIITYIRKVRSLRFTERLHYASYRRLFRGLFRAMQFTFDYIYDTranslation of CK1-1_full(231)KKRSTKPEELCLGLNSFFVNYLIAVRSLKFEEEPNYAMYRKMIYDAMIADQIPFDYRYDTranslation of CK1-2_full(237)KKISTSVEVLCRNASFEFVTYLNYCRSLRFEDRPDYTYLRRLLKDLFIREGFTYDFLFDTranslation of CK1(Plasmodium falciparum )(231)KKISTPTEVLCRGFPQEFSIYLNYTRSLRFDDKPDYAYLRKLFRDLFCRQSYEFDYMFDTranslation of CK1(Schizosaccharomyces pombe)(233)KKMSTPIEVLCKGYPSEFSTYLNFCRSLRFDDKPDYSYLRQLFRNLFHRQGFSYDYVFDTranslation of CK1(Homo sapiens )(231)KKMSTPIEVLCKGYPSEFSTYLNFCRSLRFDDKPDYSYLRQLFRNLFHRQGFSYDYVFDTranslation of CK1(Mus musculus )(231)CKMSLSLETLCKGFPAEFAAYLNYTRGLRFEDKPDYSYLKRLFRELFIREGYHVDYVFDTranslation of CK1.1(Trypansoma cruzi)(234)RKQTTPVETLCKGFPAEFAAYLNYIRSLRFEDKPDYSYLKRLFRELFIREGYHVDYVFDTranslation of CK1.2(Trypansoma cruzi )(236)KKMSTPVE LCKGFPSEFS YLNY RSLRFEDKPDYSYLRRLFRDLFIR GF YDYVFDConsensus(243)
301 392310 320 330 340 350 360 370 380(301)DWTLLPEEPPRPHFKQDVFNSKISN---------DDSSDSIIKTKQPHREKSAGTSRLSLISLPTQNVLAQSGIFLTK------------KPTranslation of 01B1(final)(273)DWSPRKDNDVPPVRYTRRKGQMP-----------------VNERRPSIEAVFSGERRRRSEENMRTIDFENEEIPEPK------------KPTranslation of CK1-1_full(289)DWVKTRIVRPQRENQSQLSERQEGKCPNSAEFDGFSSIKGYSSHRQVQSPVSSRDVIKNSSSSPSKDILQSSTLDESSQDKKPIKAVESNQKTranslation of CK1-2_full(295)DWT---------CVYASEKDKKK-----------------MLENKNRFDQTADQEGRDQRNN------------------------------Translation of CK1(Plasmodium falciparum )(289)DWTLKRKTQQDQQH---------------------------QQQLQQQLSATPQAINPP-PERSSFRNYQKQNFDEKG------------GDTranslation of CK1(Schizosaccharomyces pombe)(291)DWNMLKFGAARNPEDVDRERREH-----------------EREERMGQLRGSATRALPPGPPTGATANRLRSAAEPVA------------STTranslation of CK1(Homo sapiens )(289)DWNMLKFGAARNPEDVDRERREH-----------------EREERMGQLRGSATRALPPGPPTGATANRLRSAAEPVA------------STTranslation of CK1(Mus musculus )(289)DWTLKRIHESLQDE-----EKEL-----------------SNN-------------------------------------------------Translation of CK1.1(Trypansoma cruzi)(292)DWTLKRIHENLKAEGSG--QQEQ-----------------KQQQQQQRERGDVEQA------------------------------------Translation of CK1.2(Trypansoma cruzi )(294)DWTL R R RQ SA Consensus(301)
393 484400 410 420 430 440 450 460 470(393)PKRFSLETNQTLLSLFNK-SVNDYF-G-ILFLI-GFIFLSGKYGIVGKKKKKKKKKK-Translation of 01B1(final)(344)VEVKQIELSSSSSQDKPKTKPNYMREIDAILNRVKPIQTPKIVSHLPPPPIEELPKKLTranslation of CK1-1_full(352)PYTPPRTINTTETRMRSKTTINTARTTAKNSSAVKKESSATRTVKKETHPATTKTTKTTranslation of CK1-2_full(387)----------------------------------------------------------Translation of CK1(Plasmodium falciparum )(325)INTTVPVINDPSATGAQYINRPN-----------------------------------Translation of CK1(Schizosaccharomyces pombe)(343)PASRIQPAGNTSPRAISRVDRERKVSMRLHRGAPANVSSSDLTGRQEVSRIPASQTSVTranslation of CK1(Homo sapiens )(352)PASRIQQTGNTSPRAISRADRERKVSMRLHRGAPANVSSSDLTGRQEVSRLAASQTSVTranslation of CK1(Mus musculus )(352)----------------------------------------------------------Translation of CK1.1(Trypansoma cruzi)(313)----------------------------------------------------------Translation of CK1.2(Trypansoma cruzi )(331) T K Consensus(393)
451 542460 470 480 490 500 510 520 530(451)---------------------------------------------------------------------------------Translation of 01B1(final)(397)RKEEEKTHHHRKLSGHRTHHHESKRVVKKEKTKVEEEEEIIPKRFTKRKELEMPSDDEPLTSVDEFLIRRGLMKPRKPKI-Translation of CK1-1_full(410)VNRQLNSSTTKPATTSSHKDSEPASSRRTSTLRSSRRQNDGIRPAKERTALFTATASKPPVSYRTGMLPKWMMAPLTSRR-Translation of CK1-2_full(445)---------------------------------------------------------------------------------Translation of CK1(Plasmodium falciparum )(325)---------------------------------------------------------------------------------Translation of CK1(Schizosaccharomyces pombe)(366)PFDHLGK--------------------------------------------------------------------------Translation of CK1(Homo sapiens )(410)PFDHLGK--------------------------------------------------------------------------Translation of CK1(Mus musculus )(410)---------------------------------------------------------------------------------Translation of CK1.1(Trypansoma cruzi)(313)---------------------------------------------------------------------------------Translation of CK1.2(Trypansoma cruzi )(331) Consensus(451)
MEDICAL INFORMATICSMEDICAL INFORMATICSMEDICAL INFORMATICSMEDICAL INFORMATICS
疾病預測及診斷 ,新基因的發現基因演化整體功能及其網路調節系統
藥物設計及生物大分子結構
Focuses in Bioinformatics PerturbationEnvironmentMedicationGenetic Engineering
Dynamic ResponseGene ExpressionProtein Expression
BioChip
DataBaseGenotype/Phenotype
SymbolicAlgorithms/Computing
Analysis
BiologyMolecular BiologyBio ChemistryGenetics
Virtual Cell
Genome Sequencing
Goals Leading Toward Predictive Biology
Gene Sequence DataGene Sequence Data
Gene IdentificationGene Identification
Protein Circuit &Regulatory Network
Discovery
Protein Circuit &Regulatory Network
Discovery
BiosimulationBiosimulation
Structure PredictionStructure PredictionIL -3
IL -3R
IG F1
IG F1R
IR S 1
R A S
P I 3-K
A K T /P K B
B A D
B c l-XL
FA S -L
FA S
FA D D /MO R T
FL IC E
IC E
C P P 32
apoptos is
m itogen
C yc lin D 1
pR b
E 2F
C yc lin E
P 53
P 21
P 16
P 27
C dk4
P 107
C -Myc
C -Myc
?
B in-1
Max
Max
C dc 25A
Max
Mad
Mad
C dk2p
P 27 C yc lin E
C dk2p
C yc lin E
C dk2 p
C yc lin E
C dk2
cell pro liferation
IntegrativeApproach (Bioinformatics, Systems Science, modeling & simulation)
20th CenturyBiology
Reconstructing Cellular Functions
21th CenturyBiology
ReductionisticApproach(Genome Sequencing, DNA arrays, proteomics)
Hallmarks of Cancer
D. Hanahan and R. A. Weinberg. Cell., 100(1):57–70 Review, 2000.
Platform for Systems Biology
ppm0123456789
Complex Cellular Samplesbodyfluids, tissue
Dynamicsi.e. environmental + time
Gene
Protein
Metabolite
• Objective is to link gene response, protein activity, metabolite dynamics to disease and interventions
QuantitativeComparisons
QuantitativeComparisons
TargetsBiomarkers
TargetsBiomarkers
BioSystematicsTMBioSystematics
TM
gene
inde
x
prot
ein
inde
x
metabolite index
HO
R
MetabolomicsGenomics Proteomics
Functional Proteomics/Genomics
Transcriptomics
Systems Biology
SYSTEMS BIOLOGYSYSTEMS BIOLOGY
Q. As a biologist, what skills do I need to make the transition to bioinformatics?The fact is that many of the jobs available CURRENTLY involve the design and implementation of programs and systems for the storage, management and analysis of vast amounts of DNA sequence data. Such positions require in-depth programming and relational database skills which very few biologists possess, and so it is largely the computational specialists who are filling these roles. This is not to say the computer-savvy biologist doesn't play an important role. As the bioinformatics field matures there will be a huge demand for outreach to the biological community, as well as the need for individuals with the in-depth biological background necessary to sift through gigabases of genomic sequence in search of novel targets. It will be in these areas that biologists with the necessary computational skills will find their niche.
A. Molecular biology packages (GCG, BLAST etc),Web and programming skills including HTML, Perl, JAVA and C++, Familiar with a variety of operating systems (especially UNIX),Relational database skills such as SQL, Sybase or Oracle,Statistics,Structural biology and modeling, Mathematical optimization, Computer graphics theory and linear algebra. You will need to be able to readily pick up, use and understand the tools and databases designed by computer programmers, and To communicate biological science requirements to core computer scientists.