Bioinformatics 93 Lecture 1 – Introduction to Bioinformtics Petrus Tang, Ph.D. ( 鄧致剛 ) Graduate Institute of Basic Medical Sciences and Bioinformatics.

Bioinformatics 93Bioinformatics 93

Lecture 1 – Introduction Lecture 1 – Introduction to Bioinformticsto Bioinformtics

Petrus Tang, Ph.D. (鄧致剛 )Graduate Institute of Basic Medical SciencesandBioinformatics Center, Chang Gung [email protected]: 5136

19th September 2003

http://pastime.cgu.edu.tw/petang/index.htm

助教：葉元鳴 (分機 5122)

方怡凱 (分機 5690)

432 pages (2001) Wiley-Liss; ISBN: 0471383910

Contents

Bioinformatics and the Internet The NCBI Data Model The GenBank Sequence Database Structure Databases Genomic Mapping and Mapping Databases Information Retrieval from Biological Databases Sequence Alignment and Database Searches Multiple Sequence Alignment Predictive Methods using DNA Sequences Predictive Methods using Protein Sequences Expressed Sequence Tags Sequence Assembly and Finishing Methods Phylogenetic Analysis Comparative Genome Analysis Using Perl to Facilitate Biological Analysis

Bioinformatics: A Practical Guide to the Analysis of Bioinformatics: A Practical Guide to the Analysis of Genes & ProteinsGenes & Proteins

http://www.biosino.org/

鄧致剛老師教學網（ http://pastime.cgu.edu.tw/petang/index.htm）

國立陽明大學楊永正教授 POST 系統（ http://binfo.ym.edu.tw/post/）

國立清華大學呂平江教授 Bioinformatics教學網（ http://www.life.nthu.edu.tw/~lslpc/bioinfo.html）

國家衛生研究院巨分子序列分析服務使用說明（ http://bioinfo.nhri.org.tw/）

http://blast.ym.edu.tw/indexEasy.php

http://dblab8.csie.ncu.edu.tw/home.htm

http://www.biosino.org/

http://pastime.cgu.edu.tw/%E9%84%A7%E8%87%B4%E5%89%9B/index.htm

http://binfo.ym.edu.tw/post/

http://www.life.nthu.edu.tw/~lslpc/bioinfo.html

http://gcg.nhri.org.tw/gcg/gcguse.htm

http://www.bioweb.com.tw/index.asp

http://blast.ym.edu.tw/index2.php

http://dblab8.csie.ncu.edu.tw/home.htm

http://www.bio-engine.com/

AGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTATCGATGCATGCATGCATGCA

TGCATGCATGCATGCACTAGCTAGCTAGTGCATGCATGCATGBio

inform

atics

??

WHAT IS BIOINFORMATICS?

AGGTTGACCAATGTGAAATGGCCAATTGATGACCAGAGATTTAGGCCAATTAAAGGTTGACCAATGTGAAATGGCCAATTGATGACCAGAGA

The answer to this question depends on whether you are talking to A computer scientist who 'does' biology, or A molecular biologist who 'does' computing.

Bioinformatics is the application of computer technology to the management of biological information. Computers are used to gather, store, analyze and integrate biological and genetic information which can then be applied to gene-based drug discovery and development.

Biology

Information Technology

Phy

sics C

hemistry

Mathematics

結合生物學、計算機科學與資訊學的技術 ,應用於生物化學資料的處理，將繁瑣無意的資料轉化成有意義、有價值的訊息。

What is Bioinformatics?

exon 2exon 1 exon npromotor

5‘UTR

3‘UTRProtein coding sequence

exon n-1

Gene predictionCodon usage (single exon)

Frame 1

Frame 2

Frame 3

coding

non-coding

correct start

coding sequence

Gene predictionCodon usage (multiple exons)

Frame 1

Frame 2

Frame 3

coding

non-coding

Splice sites

Exons:208. .2951029. .13491500. .16882686. .29343326. .34443573. .36804135. .43094708. .48464993. .50967301. .73897860. .80138124. .84058553. .87139089. .922513841. .14244

Functional Assignment using Gene Ontology

Signal Transduction

4%

Enzyme18%

Nucleic Acid Binding

8%Hypothetical

11%

Unknown48%

Transporter 4%

Structural Protein2%

Ligand Binding or Carrier

2%

Cell Adhesion1%Motor Protein

1%Chaperone

1%

Nucleic Acid Binding Enzyme Signal Transduction

Transporter Structural Protein Ligand Binding or CarrierCell Adhesion Chaperone Motor Protein

Unknown Hypothetical

13,601 Genes

Drosophila

10 K

20 K

30 K

40 K

50K

Number of genes

Known genes

Otto432

1

Confidence

Gene Number in the Human Genome

Information Information DrivenDriven

Experiments Experiments

HypothesisHypothesis

Experiment Experiment DrivenDriven

Experiments Experiments

HypothesisHypothesis

ResultsResults

THE COMPONENTS OF BIOINFORMATICS

TECHNOLOGYTECHNOLOGY

DATABASEDATABASE

ALGORITHMALGORITHM

COMPUTING COMPUTING POWERPOWER

ANALYSIS ANALYSIS TOOLSTOOLS

DNA RNA phenotypeprotein

GenomeGenomeTranscriptomeTranscriptome

ProteomeProteome

MegaBRACE 1000

DNA Sequencing

96 DNA sequencing in 2 hrs, approximately 600-800 readable bps per run.

1,000,000 bps in 24 hrs.

10,000 Clones

perslide

Microarray

Proteomics

2 Dimensional Electrophoresis gels, differences that are characteristics of

the individual starting states recognized by comparison of two

protein pattern

MALDI-MS peptide mass fingerprint, for identification of

proteins separated by 2D electrophoresis

6,000 protein spots

per gel

http://www.voeding.tno.nl/PS.cfm?PNR=pharma092A



3D Modeling

DNA RNA

MicroarryESTsSAGE

phenotype

GenomeProjects

2D ElectrophoresisProtein ModelingProtein-Protein Interaction

protein

Genetic Sequence Data Bank Dec 15 2003, Release 139.0

30,968,418 loci, 36,553,368,485 bases, from 30,968,418 reported sequences

Homo sapiens: 7,152,768 sequences 10,307,972,332 bases

EST: 5,399,835 sequences

Recent years have seen an explosive growth in biological data. Large sequencing projects are producing increasing quantities of nucleotide sequences. The contents of nucleotide databases are doubling in size approximately every 14 months. The latest release of GenBank (V.139) exceeded two billion base pairs. Not only the size of sequence data is rapidly increasing, but also the number of characterized genes from many organisms and protein structures doubles about every two years. To cope with this great quantity of data, a new scientific discipline has emerged: bioinformatics, biocomputing or computational biology

Entries Bases Species 7152768 10307972332 Homo sapiens 5440187 6293834876 Mus musculus 843877 5553994470 Rattus norvegicus 569066 1186984387 Danio rerio 1700546 1060366537 Zea mays 266693 714394030 Oryza sativa (japonica c361257 697671246 Drosophila melanogaster 887467 508114342 Canis familiaris 596909 503066134 Gallus gallus 589739 424174855 Arabidopsis thaliana 595964 403835636 Brassica oleracea 650813 396135104 Bos taurus 175582 385807223 Pan troglodytes 25432 337349532 Macaca mulatta 553918 303123150 Triticum aestivum 499249 294204103 Ciona intestinalis 250141 237196637 Medicago truncatula 222595 233083018 Caenorhabditis elegans 364210 232049749 Xenopus laevis 324322 211144144 Zea mays subsp. mays 297857 210279451 Silurana tropicalis



DATABASEDATABASE

ALGORITHMALGORITHM



The International Nucleotide The International Nucleotide Sequence Database CollaborationSequence Database Collaboration

EMBLEMBL:European Bioinformatics Institute (EBI)

GenBankGenBank: National Center for Biotechnology Information (NCBI)

http://www.ncbi.nlm.nih.gov/

DDBJDDBJ:National Institute of Genetics (NIG)

http://www.ddbj.nig.ac.jp/

http://www.ebi.ac.uk

ExPASyExPASy: Expert Protein Analysis System

http://tw.expasy.org

http://www.ncbi.nlm.nih.gov/

http://www.ddbj.nig.ac.jp/

http://www.ebi.ac.uk/index.html

http://tw.expasy.org/

IAM: International Advisory Meeting ICM: International Collaborative Meeting

GenBank/EMBL/DDBJInternational Nucleotide Sequence Database

EMBL: European Molecular Biology LaboratoryEBI: European Bioinformatics Institute

DDBJ: DNA Data Bank of JapanCIB: Center for Information Biology and DNA Data Bank of JapanNIG: National Institute of Genetics

NCBI: National Center for Biotechnology InformationNLM: National Library of Medicine

Protein DatabasesProtein Databases

In 1988, The Protein Information Resource (PIR), established a cooperative effort with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID) , produces the PIR-International . Protein Sequence Database (PIR-PSD) -- a comprehensive, non-redundant, expertly annotated, fully classified and extensively cross-referenced protein sequence database in the public domain. The PIR-PSD, PIR-NREF, iProClass and other PIR auxiliary databases provide an integration of sequences, functional, and structural information to support genomics and proteomics researchThe PIR-PSD, Current Release 71.04, March 01, 2002, Contains 283153 Entries

http://pir.georgetown.edu/Protein Information Resources (PIR)Protein Information Resources (PIR)

SWISSPROTSWISSPROT http://www.ebi.ac.uk/swissprot/

The SWISS-PROT Protein Knowledgebase is an annotated protein sequence database established in 1986. It is maintained collaboratively by the Swiss Institute for Bioinformatics (SIB) and the European Bioinformatics Institute (EBI).

http://www.mips.biochem.mpg.de/

http://pir.georgetown.edu/pirwww/aboutpir/collaborate.html

http://www.isb-sib.ch/

http://pir.georgetown.edu/

http://www.ebi.ac.uk/swissprot/

Protein DatabasesProtein Databases

http://tw.expasy.orgExPASY Molecular Biology ServerExPASY Molecular Biology Server

The ExPASy (Expert Protein Analysis System) proteomics server of the Swiss Institute of Bioinformatics (SIB) is dedicated to the analysis of protein sequences and structures as well as 2-D PAGE

http://www.rcsb.orgProtein Data BankProtein Data BankThe Protein Data Bank (PDB) is operated by Rutgers, The State University of New Jersey; the San Diego Supercomputer Center at the University of California, San Diego; and the National Institute of Standards and Technology -- three members of the Research Collaboratory for Structural Bioinformatics (RCSB). The PDB is supported by funds from the National Science Foundation, the Department of Energy, and two units of the National Institutes of Health: the National Institute of General Medical Sciences and the National Library of Medicine.

http://www.expasy.ch/proteomics_def.html

http://www.isb-sib.ch/

http://www.rcsb.org/index.html

http://tw.expasy.org/

http://tw.expasy.org/people/personal/nguex/ExPASy_LogoPage.html

http://www.rcsb.org/

The Cancer Genome Anatomy Project(CGAP) http://cgap.nci.nih.gov/

Metabolic & Signalling Pathways

Biocarta( http://biocarta.com)

Kyto Encyclopedia of Genes &Genomeshttp://www.genome.ad.jp/kegg/



DATABASEDATABASE

ALGORITHMALGORITHM



BIOINFORMATICS ANALYSIS TOOLSBIOINFORMATICS ANALYSIS TOOLS

$$ Vector NTI suite, Omiga, DNAsis

$$ Staden Package, EMBOSIS, BLAST, FASTA

On line analysis tools

國家衛生研究院巨分子序列分析服務

http://bioinfo.nhri.org.tw/

在 Unix 系統下以 Command Mode 進行核酸或蛋白質的序列分析。 ( telnet://bioinfo.nhri.org.tw )

巨分子序列分析服務 GCG

巨分子序列分析服務 SeqWeb 連線至 SeqWEB 以瀏覽器進行核酸或蛋白質的序列分析。

(http://apdb1.nhri.org.tw:8003/)

Smith-Waterman 快速序列搜尋系統 GenWEB 直接連線至 GenWeb 以瀏覽器進行核酸或蛋白質的快速序列搜尋。以特殊設計的硬體加速序列搜尋的速度 , 可進行 Smith-Waterman 及 FrameSearch 等搜尋功能。 (http://sw.nhri.org.tw/cgi-bin/genweb/bin/login.cgi)

ExPASy (Expert Protein Analysis System) 連線至 ExPASy 以瀏覽器進行蛋白質的序列分析。

(http://tw.expasy.org)

EMBOSS 連線至 SeqWEB 以瀏覽器進行核酸或蛋白質的序列分析

(http://emboss.nhri.org.tw/)



DATABASEDATABASE

ALGORITHMALGORITHM



Chang Gung Bioinformatics Center (CGBC)Chang Gung Bioinformatics Center (CGBC)

Sun StorEdge T3 FC-AL Disk Array( Append 1152GB Capacity )•2 x 9 x 18GB FC-AL HDD•2 x 9 x 36GB FC-AL HDD•2 x 9 x 72GB FC-AL HDD•RAID 5 formatted Capacity : 2016GB•6 x Fibre Channel RAID Controller•6 x 258MB Cache Memory

Local Area Network ( Fast Ethernet Switch )

Sun Fire 6800 Server ( Append 8 CPUs 8 GB Memory )•Four System Domain•4 x system board•16 x 750MHz Ultra SPARC III CPU•32GB Memory•6 x Power Supply, 4 x Fan tray•2 x System Controller•4 x PCI Assembly with 8 slots•4 x 10/100 FastEthernet Interface•4 Media tray (DVD, DDS-4 tape driver)•4 x2x18GB HDD ( OS Mirror protect)•4 x PCI-Bus FC-AL Network Adapter

Sun L20 Tape Library•1 x DLT8000, 20 slots•Backup Capacity : 800/1600GB•Solstice Backup software

SAN Switch

Fibre Channelto SCSI Router

PC P4•1.5GHz P4 CPU•2GB RAM, 80GB IDE HDD

x 20 x6

Sun Blade 1000•750MHz Ultra SPARC III CPU•512MB RAM, 18GB SCSI HDD

長庚生物資訊中心核心硬體架構長庚生物資訊中心核心硬體架構

Steps to Identify a GeneSteps to Identify a Gene Gene-Search Protein-Search Annotation

-2 …AGATGCGAAAAA TCTACGGCAA TTACATTACG CAGAAGCGTC TCGGTTCAGG

AAGTTTCGGA GAGGTTTGGG AAGCTGTCAG TCATTCGACC GGACAAAAGG

101 TTGCTCTCAA ATTAGAGCCC CGAAACTCTA GTGTTCCACA ATTATTTTTC

GAAGCCAAGC TATACTCAAT GTTTCAGGCT TCAAAATCCA CAAATAATAG

201 TGTAGAACCA TGCAACAACA TTCCAGTTGT TTATGCGACT GGTCAAACAG

AGACAACTAA CTACATGGCC ATGGAATTAC TTGGCAAGTC TCTGGAAGAT

301 TTAGTTTCAT CGGTCCCTAG ATTTTCCCAA AAGACAATAT TAATGCTTGC

CGGACAAATG ATTTCCTGTG TTGAATTCGT TCACAAACAT AATTTTATTC

401 ACCGCGACAT CAAGCCAGAT AATTTTGCGA TGGGAGTCAG TGAGAACTCA AACAAAATTT ATATTATCGA TTTTGGACTT TCCAAGAAGT ACATTGACCA

501 AAATAATCGT CATATTAGAA ATTGCACAGG AAAATCACTT ACCGGAACCG

CAAGATATTC ATCAATTAAT GCGCTCGAAG GAAAGGAACA GTCTATAAGA 601 GATGACATGG AATCTTTGGT ATATGTCTGG GTTTATTTAC

TTCATGGACG TCTTCCTTGG ATGAGCTTAC CTACAACAGG CCGCAAGAAG

TATGAGGCCA 701 TTTTAATGAA GAAGAGATCA ACGAAACCCG AAGAATTATG

TTTAGGACTT AATAGTTTCT TTGTAAACTA CTTAATAGCA GTTCGCTCAT

TGAAATTTGA 801 AGAAGAACCA AATTACGCGA TGTACAGGAA AATGATATAC

GACGCAATGA TTGCTGATCA AATTCCTTTT GATTATCGCT ATGATTGGGT

CAAAACGAGA 901 ATTGTTCGCC CACAACGTGA AAACCAATCA CAGTTGTCCG

AACGTCAAGA AGGAAAATGT CCAAACTCAG CTGAGTTTGA TGGTTTCTCC

TCCATCAAAG 1001 GATATTCTTC GCACAGACAA GTACAAAGCC CCGTTTCATC

TAGAGATGTC ATTAAGAACA GTAGTTCAAG TCCATCAAAG GATATTTTGC

AATCATCAAC 1101 CCTTGATGAA TCATCTCAAG ATAAAAAGCC AATCAAAGCT

GTCGAATCGA ATCAGAAACC ATATACACCG CCACGTACAA TTAATACTAC

CGAAACAAGA 1201 ATGAGATCAA AGACTACAAT CAATACTGCA AGAACAACAG

CAAAGAACTC TTCGGCAGTT AAGAAAGAAT CGTCAGCAAC AAGGACTGTT

AAGAAAGAAA 1301 CACATCCTGC AACTACAAAA ACAACAAAAA CTGTAAATAG

ACAATTGAAC TCTTCTACAA CGAAACCGGC AACTACGAGC TCTCACAAAG

ACTCAGAACC 1401 GGCTTCATCA AGACGTACAT CAACTCTACG TTCAAGTCGC

CGCCAAAATG ACGGAATTCG CCCTGCAAAG GAAAGAACTG CGCTTTTCAC

AGCTACAGCC 1501 AGTAAGCCTC CGGTATCTTA CCGTACTGGA ATGCTTCCGA

AATGGATGAT GGCTCCTCTC ACATCTCGTC GCTGAAATAT ATTTTTTATA

TTATTTATTT 1601 TTTTCTTTTT CTATCTGTAT ATTAAATGTA TTTCTATATT

ATTAAAAAAA

Full length ORF of TvEST-14G2

1 9210 20 30 40 50 60 70 80(1)----------MKVGERIGGGSYGNIFYAYNTANKKELALKIESEKTKRSQIFNEYRALKCLAGY----------VGIPKVYFETCYGNQNAFTranslation of 01B1(final) (1)--MEEICGGEYQIIKKIGQGSFGKIYIIKQVKTGLLFAAKLENSDAPIPQLLFESRLYQIMSGS----------TNVPRLHAHSFDSRYNTITranslation of CK1-1_full (1)---MRKIYGNYITQKRLGSGSFGEVWEAVSHSTGQKVALKLEPRNSSVPQLFFEAKLYSMFQASKSTNNSVEPCNNIPVVYATGQTETTNYMTranslation of CK1-2_full (1)--MEIRVANKYALGKKLGSGSFGDIYVAKDIVTMEEFAVKLESTRSKHPQLLYESKLYKILGGG----------IGVPKVYWYGIEGDFTIMTranslation of CK1(Plasmodium falciparum ) (1)MALDLRIGNKYRIGRKIGSGSFGDIYLGTNVVSGEEVAIKLESTRAKHPQLEYEYRVYRILSGG----------VGIPFVRWFGVECDYNAMTranslation of CK1(Schizosaccharomyces pombe) (1)--MELRVGNKYRLGRKIGSGSFGDIYLGANIASGEEVAIKLECVKTKHPQLHIESKFYKMMQGG----------VGIPSIKWCGAEGDYNVMTranslation of CK1(Homo sapiens ) (1)--MELRVGNKYRLGRKIGSGSFGDIYLGANIASGEEVAIKLECVKTKHPQLHIESKFYKMMQGG----------VGIPSIKWCGAEGDYNVMTranslation of CK1(Mus musculus ) (1)--MNLMIANRYCISQKIGAGSFGEIFRGTNMQTGETVAIKLEQAKTRHPQLAFEARFYRILNAGGGV-------VGIPNILFYGVEGEFNVMTranslation of CK1.1(Trypansoma cruzi) (1)MSLELRVGNRFRLGQKIGAGSFGEIFRGTNIQTGETVAIKLEQAKTRHPQLALEARFYRILNAGGGV-------VGIPNILFYGVEGEFNVMTranslation of CK1.2(Trypansoma cruzi ) (1) MELRVGNKYRLGKKIGSGSFGDIYLG NI TGEEVAIKLE KTKHPQL FESR YKILQGG VGIP I W G EGDYNVMConsensus (1)

93 184100 110 120 130 140 150 160 170(93)TMELLGDSLEKLFERCGRKFSLKTVLMLADQMIKCVQYIHTKSFIHRDIKPENFTIGTTranslation of 01B1(final) (73)VIDLLGKSLEEHLNKVNRRMSLKTVLMLVDQMITAVEFFHSKNYIHRDIKPDNFVMGVTranslation of CK1-1_full (81)AMELLGKSLEDLVSSVP-RFSQKTILMLAGQMISCVEFVHKHNFIHRDIKPDNFAMGVTranslation of CK1-2_full (90)VLDLLGPSLEDLFTLCNRKFSLKTVRMTADQMLNRIEYVHSKNFIHRDIKPDNFLIGRTranslation of CK1(Plasmodium falciparum ) (81)VMDLLGPSLEDLFNFCNRKFSLKTVLLLADQLISRIEFIHSKSFLHRDIKPDNFLMGITranslation of CK1(Schizosaccharomyces pombe) (83)VMELLGPSLEDLFNFCSRKFSLKTVLLLADQMISRIEYIHSKNFIHRDVKPDNFLMGLTranslation of CK1(Homo sapiens ) (81)VMELLGPSLEDLFNFCSRKFSLKTVLLLADQMISRIEYIHSKNFIHRDVKPDNFLMGLTranslation of CK1(Mus musculus ) (81)VMDLLGPSLEDLFSFCGRKLSLKTTLMLAEQMIARIEFVHSKSVIHRDMKPDNFLMGTTranslation of CK1.1(Trypansoma cruzi) (84)VMDLLGPSLEDLFSFCDRKLSLKTTLMLAEQMIARIEFVHSKSVIHRDMKPDNFLMGTTranslation of CK1.2(Trypansoma cruzi ) (86)VMDLLGPSLEDLF FC RKFSLKTVLMLADQMISRIEFIHSKNFIHRDIKPDNFLMGLConsensus (93)

151 242160 170 180 190 200 210 220 230(151)GPNSNVIYIIDFGLAKRYINGQTLTHIPYREGRSFTGTTRYGSINDHLDIEQSRRDDMESLAYTLIYFLKGFLPWHGCKRETFQ--------Translation of 01B1(final)(131)NQNSNKLYIIDYGLAKKYRDVNTHEHIPYIEGKSLTGTARYASINALLGCEQSRRDDMEAIGYVIVYLLKGHLPWMGIDGATNQERYRRIAETranslation of CK1-1_full(139)SENSNKIYIIDFGLSKKYIDQ-NNRHIRNCTGKSLTGTARYSSINALEGKEQSIRDDMESLVYVWVYLLHGRLPWMSLPTTGRK-KYEAILMTranslation of CK1-2_full(147)GKKVTLIHIIDFGLAKKYRDSRSHTSYPYKEGKNLTGTARYASINTHLGIEQSRRDDIEALGYVLMYFLRGSLPWQGLKAISKKDKYDKIMETranslation of CK1(Plasmodium falciparum )(139)GKRGNQVNIIDFGLAKKYRDHKTHLHIPYRENKNLTGTARYASINTHLGIEQSRRDDLESLGYVLVYFCRGSLPWQGLKATTKKQKYEKIMETranslation of CK1(Schizosaccharomyces pombe)(141)GKKGNLVYIIDFGLAKKYRDARTHQHIPYRENKNLTGTARYASINTHLGIEQSRRDDLESLGYVLMYFNLGSLPWQGLKAATKRQKYERISETranslation of CK1(Homo sapiens )(139)GKKGNLVYIIDFGLAKKYRDARTHQHIPYRENKNLTGTARYASINTHLGIEQSRRDDLESLGYVLMYFNLGSLPWQGLKAATKRQKYERISETranslation of CK1(Mus musculus )(139)GKKGHHVYVVDFGLAKKYRDPRTHQHIPYKEGKSLTGTARYCSINTHLGIEQSRRDDLEGIGYILMYFLRGSLPWQGLPAATKQEKYVAIAKTranslation of CK1.1(Trypansoma cruzi)(142)GKKGHHVYVVDFGLAKKYRDPRTHQHIPYKEGKSLTGTARYCSINTHLGIEQSRRDDLEGIGYILMYFLRGSLPWQGLKAHTKQEKYSRISETranslation of CK1.2(Trypansoma cruzi )(144)GKKGN VYIIDFGLAKKYRD RTH HIPYREGKSLTGTARYASINTHLGIEQSRRDDLESLGYVLMYFLRGSLPWQGLKA TKK KYERISEConsensus(151)

243 334250 260 270 280 290 300 310 320(243)IKLSTSVEELCEGLPVEFSIFLQDMRKLDFEEEPNYSKYLQLFRSLFLNSGFVYDDVYDTranslation of 01B1(final)(215)CKRDTPLEKLCEGLPSEIITYIRKVRSLRFTERLHYASYRRLFRGLFRAMQFTFDYIYDTranslation of CK1-1_full(231)KKRSTKPEELCLGLNSFFVNYLIAVRSLKFEEEPNYAMYRKMIYDAMIADQIPFDYRYDTranslation of CK1-2_full(237)KKISTSVEVLCRNASFEFVTYLNYCRSLRFEDRPDYTYLRRLLKDLFIREGFTYDFLFDTranslation of CK1(Plasmodium falciparum )(231)KKISTPTEVLCRGFPQEFSIYLNYTRSLRFDDKPDYAYLRKLFRDLFCRQSYEFDYMFDTranslation of CK1(Schizosaccharomyces pombe)(233)KKMSTPIEVLCKGYPSEFSTYLNFCRSLRFDDKPDYSYLRQLFRNLFHRQGFSYDYVFDTranslation of CK1(Homo sapiens )(231)KKMSTPIEVLCKGYPSEFSTYLNFCRSLRFDDKPDYSYLRQLFRNLFHRQGFSYDYVFDTranslation of CK1(Mus musculus )(231)CKMSLSLETLCKGFPAEFAAYLNYTRGLRFEDKPDYSYLKRLFRELFIREGYHVDYVFDTranslation of CK1.1(Trypansoma cruzi)(234)RKQTTPVETLCKGFPAEFAAYLNYIRSLRFEDKPDYSYLKRLFRELFIREGYHVDYVFDTranslation of CK1.2(Trypansoma cruzi )(236)KKMSTPVE LCKGFPSEFS YLNY RSLRFEDKPDYSYLRRLFRDLFIR GF YDYVFDConsensus(243)

301 392310 320 330 340 350 360 370 380(301)DWTLLPEEPPRPHFKQDVFNSKISN---------DDSSDSIIKTKQPHREKSAGTSRLSLISLPTQNVLAQSGIFLTK------------KPTranslation of 01B1(final)(273)DWSPRKDNDVPPVRYTRRKGQMP-----------------VNERRPSIEAVFSGERRRRSEENMRTIDFENEEIPEPK------------KPTranslation of CK1-1_full(289)DWVKTRIVRPQRENQSQLSERQEGKCPNSAEFDGFSSIKGYSSHRQVQSPVSSRDVIKNSSSSPSKDILQSSTLDESSQDKKPIKAVESNQKTranslation of CK1-2_full(295)DWT---------CVYASEKDKKK-----------------MLENKNRFDQTADQEGRDQRNN------------------------------Translation of CK1(Plasmodium falciparum )(289)DWTLKRKTQQDQQH---------------------------QQQLQQQLSATPQAINPP-PERSSFRNYQKQNFDEKG------------GDTranslation of CK1(Schizosaccharomyces pombe)(291)DWNMLKFGAARNPEDVDRERREH-----------------EREERMGQLRGSATRALPPGPPTGATANRLRSAAEPVA------------STTranslation of CK1(Homo sapiens )(289)DWNMLKFGAARNPEDVDRERREH-----------------EREERMGQLRGSATRALPPGPPTGATANRLRSAAEPVA------------STTranslation of CK1(Mus musculus )(289)DWTLKRIHESLQDE-----EKEL-----------------SNN-------------------------------------------------Translation of CK1.1(Trypansoma cruzi)(292)DWTLKRIHENLKAEGSG--QQEQ-----------------KQQQQQQRERGDVEQA------------------------------------Translation of CK1.2(Trypansoma cruzi )(294)DWTL R R RQ SA Consensus(301)

393 484400 410 420 430 440 450 460 470(393)PKRFSLETNQTLLSLFNK-SVNDYF-G-ILFLI-GFIFLSGKYGIVGKKKKKKKKKK-Translation of 01B1(final)(344)VEVKQIELSSSSSQDKPKTKPNYMREIDAILNRVKPIQTPKIVSHLPPPPIEELPKKLTranslation of CK1-1_full(352)PYTPPRTINTTETRMRSKTTINTARTTAKNSSAVKKESSATRTVKKETHPATTKTTKTTranslation of CK1-2_full(387)----------------------------------------------------------Translation of CK1(Plasmodium falciparum )(325)INTTVPVINDPSATGAQYINRPN-----------------------------------Translation of CK1(Schizosaccharomyces pombe)(343)PASRIQPAGNTSPRAISRVDRERKVSMRLHRGAPANVSSSDLTGRQEVSRIPASQTSVTranslation of CK1(Homo sapiens )(352)PASRIQQTGNTSPRAISRADRERKVSMRLHRGAPANVSSSDLTGRQEVSRLAASQTSVTranslation of CK1(Mus musculus )(352)----------------------------------------------------------Translation of CK1.1(Trypansoma cruzi)(313)----------------------------------------------------------Translation of CK1.2(Trypansoma cruzi )(331) T K Consensus(393)

451 542460 470 480 490 500 510 520 530(451)---------------------------------------------------------------------------------Translation of 01B1(final)(397)RKEEEKTHHHRKLSGHRTHHHESKRVVKKEKTKVEEEEEIIPKRFTKRKELEMPSDDEPLTSVDEFLIRRGLMKPRKPKI-Translation of CK1-1_full(410)VNRQLNSSTTKPATTSSHKDSEPASSRRTSTLRSSRRQNDGIRPAKERTALFTATASKPPVSYRTGMLPKWMMAPLTSRR-Translation of CK1-2_full(445)---------------------------------------------------------------------------------Translation of CK1(Plasmodium falciparum )(325)---------------------------------------------------------------------------------Translation of CK1(Schizosaccharomyces pombe)(366)PFDHLGK--------------------------------------------------------------------------Translation of CK1(Homo sapiens )(410)PFDHLGK--------------------------------------------------------------------------Translation of CK1(Mus musculus )(410)---------------------------------------------------------------------------------Translation of CK1.1(Trypansoma cruzi)(313)---------------------------------------------------------------------------------Translation of CK1.2(Trypansoma cruzi )(331) Consensus(451)

Amino Acid Sequence Comparison

01B104E1214G2PFCKYeastHumanMouseTcCK1.1TcCK1.2




: kinesin homology domain

: casein kinase 1 specific motifs

PFCK : Plasmodium casein kinase 1TcCK1.1: Trypansoma cruzi casein kinase 1.1TcCK1.2: Trypansoma cruzi casein kinase 1.2

Similarity of Various CK1s from Different Species

TvEST-04E12

TvEST-14G2

TvEST-01B1

T. cruzi CK1.1

T. cruzi CK1.2

PFCK Yeast

CK1

Mouse

CK1

Human

CK1

TvEST-04E12 100 32 32 34 34 34 37 37 37TvEST-14G2 100 24 24 23 24 24 26 25TvEST-01B1 100 47 47 48 48 38 38T. cruzi CK1.1 100 23 73 24 61 61T. cruzi CK1.2 100 74 70 63 63PFCK 100 69 62 62Yeast

CK1 100 69 67Mouse

CK1 100 99Human

CK1 100

3-D Structure of TvEST-14G2 and other CK1s

TVEST-14G2

MRKIYGNYIT QKRLGSGSFG EVWEAVSHST GQKVALKLEP RNSSVPQLFF

EAKLYSMFQA SKSTNNSVEP CNNIPVVYAT GQTETTNYMA MELLGKSLED

LVSSVPRFSQ KTILMLAGQM ISCVEFVHKH NFIHRDIKPD NFAMGVSENS

NKIYIIDFGL SKKYIDQNNR HIRNCTGKSL TGTARYSSIN ALEGKEQSIR

DDMESLVYVW VYLLHGRLPW MSLPTTGRKK YEAILMKKRS TKPEELCLGL

NSFFVNYLIA VRSLKFEEEP NYAMYRKMIY DAMIADQIPF DYRYDWVKTR

IVRPQRENQS QLSERQEGKC PNSAEFDGFS SIKGYSSHRQ VQSPVSSRDV

IKNSSSSPSK DILQSSTLDE SSQDKKPIKA VESNQKPYTP PRTINTTETR

MRSKTTINTA RTTAKNSSAV KKESSATRTV KKETHPATTK TTKTVNRQLN

SSTTKPATTS SHKDSEPASS RRTSTLRSSR RQNDGIRPAK ERTALFTATA

SKPPVSYRTG MLPKWMMAPL TSRR

1

51

101

151

201

251

301

351

401

451

501

TcCK1.2TcCK1.1 Human CK1-δPfCK1 Mouse CK1Yeast CK1

GENOMICSGENOMICSGENOMICSGENOMICS

GENE EXPRESSION ANALYSISGENE EXPRESSION ANALYSISGENE EXPRESSION ANALYSISGENE EXPRESSION ANALYSIS

PROTEOMICSPROTEOMICSPROTEOMICSPROTEOMICS

BIOINFORMATICSBIOINFORMATICSBIOINFORMATICSBIOINFORMATICS

BIOINFORMATICSBIOINFORMATICSBIOINFORMATICSBIOINFORMATICS

MEDICAL INFORMATICSMEDICAL INFORMATICSMEDICAL INFORMATICSMEDICAL INFORMATICS

疾病預測及診斷 ,新基因的發現基因演化整體功能及其網路調節系統

藥物設計及生物大分子結構

Focuses in Bioinformatics PerturbationEnvironmentMedicationGenetic Engineering

Dynamic ResponseGene ExpressionProtein Expression

BioChip

DataBaseGenotype/Phenotype

SymbolicAlgorithms/Computing

Analysis

BiologyMolecular BiologyBio ChemistryGenetics

Virtual Cell

Genome Sequencing

Goals Leading Toward Predictive Biology

Gene Sequence DataGene Sequence Data

Gene IdentificationGene Identification

Protein Circuit &Regulatory Network

Discovery

Protein Circuit &Regulatory Network

Discovery

BiosimulationBiosimulation

Structure PredictionStructure PredictionIL -3

IL -3R

IG F1

IG F1R

IR S 1

R A S

P I 3-K

A K T /P K B

B A D

B c l-XL

FA S -L

FA S

FA D D /MO R T

FL IC E

IC E

C P P 32

apoptos is

m itogen

C yc lin D 1

pR b

E 2F

C yc lin E

P 53

P 21

P 16

P 27

C dk4

P 107

C -Myc

C -Myc

?

B in-1

Max

Max

C dc 25A

Max

Mad

Mad

C dk2p

P 27 C yc lin E

C dk2p

C yc lin E

C dk2 p

C yc lin E

C dk2

cell pro liferation

IntegrativeApproach (Bioinformatics, Systems Science, modeling & simulation)

20th CenturyBiology

Reconstructing Cellular Functions

21th CenturyBiology

ReductionisticApproach(Genome Sequencing, DNA arrays, proteomics)

Hallmarks of Cancer

D. Hanahan and R. A. Weinberg. Cell., 100(1):57–70 Review, 2000.

Platform for Systems Biology

ppm0123456789

Complex Cellular Samplesbodyfluids, tissue

Dynamicsi.e. environmental + time

Gene

Protein

Metabolite

• Objective is to link gene response, protein activity, metabolite dynamics to disease and interventions

QuantitativeComparisons

QuantitativeComparisons

TargetsBiomarkers

TargetsBiomarkers

BioSystematicsTMBioSystematics

TM

gene

inde

x

prot

ein

inde

x

metabolite index

HO

R

MetabolomicsGenomics Proteomics

Functional Proteomics/Genomics

Transcriptomics

Systems Biology

SYSTEMS BIOLOGYSYSTEMS BIOLOGY

Q. As a biologist, what skills do I need to make the transition to bioinformatics?The fact is that many of the jobs available CURRENTLY involve the design and implementation of programs and systems for the storage, management and analysis of vast amounts of DNA sequence data. Such positions require in-depth programming and relational database skills which very few biologists possess, and so it is largely the computational specialists who are filling these roles. This is not to say the computer-savvy biologist doesn't play an important role. As the bioinformatics field matures there will be a huge demand for outreach to the biological community, as well as the need for individuals with the in-depth biological background necessary to sift through gigabases of genomic sequence in search of novel targets. It will be in these areas that biologists with the necessary computational skills will find their niche.

A. Molecular biology packages (GCG, BLAST etc),Web and programming skills including HTML, Perl, JAVA and C++, Familiar with a variety of operating systems (especially UNIX),Relational database skills such as SQL, Sybase or Oracle,Statistics,Structural biology and modeling, Mathematical optimization, Computer graphics theory and linear algebra. You will need to be able to readily pick up, use and understand the tools and databases designed by computer programmers, and To communicate biological science requirements to core computer scientists.

Bioinformatics 93 Lecture 1 – Introduction to Bioinformtics Petrus Tang, Ph.D. ( 鄧致剛 ) Graduate Institute of Basic Medical Sciences and Bioinformatics.

Documents