Top Banner

Click here to load reader

of 20

Protein Folding recognition with Committee Machine Mika Takata.

Jan 18, 2018

Download

Documents

Paula Norris

Background  Computation + biology + chemical + medicine + ・・・・ = significantly important  Structure Classification Of Protein database  Fold level class : remote homology  Better recognition, better Tertiary structure prediction All alpha SCOP All beta a/ba+b Globin- like Cytoch- rome c Cupre- doxins (TIM)- barrel β- grasp class Fold ・・・・・ ・・・・
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript

Protein Folding recognition with Committee Machine Mika Takata Outline Background System Outline Experiment Experimental result Reference 2 Background Computation + biology + chemical + medicine + = significantly important Structure Classification Of Protein database Fold level class : remote homology Better recognition, better Tertiary structure prediction All alpha SCOP All beta a/ba+b Globin- like Cytoch- rome c Cupre- doxins (TIM)- barrel - grasp class Fold 1. Chemical approaching parameter ( i ) i. 6 types of Chemical features ii. String windows N-grams iii. Protein molecular weight value iv. Protein sequential length value 4 1. Chemical approaching parameter ( ii ): Global parameter Symbol C Frequencies of 20 amino acid symbols in a protein sequence Symbol S, H, V, P, Z (3-dim: composition, 3-dim: transition, 35-dim: Distribution) 1. Chemical approaching parameter ( iii ) Protein molecular weight value Sum of Amino acids molecular weight Utilize of molecular weight Protein sequential length value Utilize of sequential length 2. Feature parameter based on Sliding window N-Gram Proteomic fragment similarity string length =2 NSDWTNNETRHAIVILIIIIIMLRHGKIPYWCMIPFAA 3: Feature parameter based on HMM Fig 1 feature parameter flow based on HMM Training data Test data Model Model C S V H P Seq-Length Z Mol-Weight Model Spectrum Kernel HMM decision_ Committe e SVM_1 Committe e SVM_ Committe e SVM_27 Step 2 Step 1 Evaluation measurement Accuracy Q shows how correctly recognized in class i The numbers of data in each class are various Experiment Parameter i. Chemical approaching parameter ii. Feature parameter based on Sliding window kernel (string length = 2 & 3) iii. Feature parameter based on HMM i. Classification Methods i. independent SVM ii. Committee SVM Array Multi-class recognition approaches i. One-vs-others ii. All-vs-All method Data set Training data 341, test data 353 (total: 694) Cross Validation 10 times Result (1) Independent SVM- Model I Result (2) CM- Model I Result (3) CM- Model II Result (3) Model I & II Result (4) Model I & III Result (5) : Model I & II & III Conlusion Improvement by using all models of Committee Machine Spectrum kernel was works if used with string length of 2 advantage Take advantage of sporadic data ( ex. chemical base and hmm) Reduce of computational cost Reference ( i ) 1. Takata, M., Matsuyama, Y.: Protein Folding Classification by Committee SVM Array, Lecture Notes in Computer Science, No.5507, pp , Matsuyama, Y., Kawasaki, K., Hotta, T, mizutani, Takata, M., Ishida, A.: Eukaryotic transcription start site recognition involving non-promoter model. Intelligent Systems for Molecular Biology, Toronto (2008) L05 3. Matsuyama, Y., Ishihara, Y., Ito, Y., Hotta, T., Kawasaki, K., Hasegawa, T., Takata, M.: Promoter recognition involving motif detection: Studies on E. coli and human genes. Intelligent Systems for Molecular Biology, Vienna (2007) H Dubchak, I., Muchunik, I., Holbrook, S.R., Kim, S-H.: Prediction of protein folding class using global description of amino acid sequence. Proc. Natl. Acad. Sci. USA 92 (1995) 8700 Dubchak, I., Muchnik, I., Mayor, C., Dralyyuk, I., Kim, S-H.: Recognition of a Protein Fold in the Context of the SCOP Classification. Proteins: Structure, Function, and Genetics 35 (1999) 401407 Reference ( ii ) 1. Ding, C.H.Q, Dubchak, I.: Multi-class protein fold recognition using support vector machines and neural networks. Bioinfo. 17 (2001) 349 Mount,. D.W.: Bioinformatics. Cold Spring Harbor Laboratory Press (2001) 3. Murzin, A.G., Brenner, S.E., Hubbard, T., Chothia, C.: SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol., 247 (1995) 536 Leslie, C., Eskin, E., Noble, W.S.: The Spectrum kernel: A string kernel for SVM protein classification. Pacific Symposium on Biocomputing 7 (2002) 566 Tabrez, M., Shamim, A., Anwaruddin, M., Nagarajaram, H.A.: Support vector machine-based classification of protein folds using the structural properties of amino acid residues and amino acid residue pairs. Bioinfo. 23 (2007) 3320 Lodhi, H,., Saunders, C., Shawe-Taylor, J., Watkins, C.: Text classification using string kernels. J. of Machine Learning Research 2 (2002) 419444.