LSM3241: Bioinformatics and LSM3241: Bioinformatics and Biocomputing Biocomputing Lecture 3: Machine learning Lecture 3: Machine learning method for protein function method for protein function prediction prediction Prof. Chen Yu Zong Prof. Chen Yu Zong Tel: 6516-6877 Tel: 6516-6877 Email: Email: [email protected][email protected]http:// http:// bidd.nus.edu.sg bidd.nus.edu.sg Room 07-24, level 7, SOC1, Room 07-24, level 7, SOC1, National University of Singapore National University of Singapore
27
Embed
LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: [email protected].
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
LSM3241: Bioinformatics and BiocomputingLSM3241: Bioinformatics and Biocomputing
Lecture 3: Machine learning method for Lecture 3: Machine learning method for protein function predictionprotein function prediction
National University of Singapore National University of Singapore
22
Protein Function and Functional FamilyProtein Function and Functional FamilyProteins of similar functional characteristics can be grouped into a family
33
Protein Function and Functional FamilyProtein Function and Functional FamilyProteins of similar functional characteristics can be grouped into a family
44
Protein Function and Functional FamilyProtein Function and Functional FamilyProteins of similar functional characteristics can be grouped into a family
55
Functional Classification of Proteins by SVMFunctional Classification of Proteins by SVM
• A protein is classified as either belong (+) or not belong (-) to a functional family
• By screening against all families, the function of this protein can be identified (example: SVMProt)
Protein
Family-1 SVM
Family-2 SVM
Family-3 SVM
Protein belongs toFamily-3
-
-
+
--
66
Functional Classification of Proteins by SVMFunctional Classification of Proteins by SVM
What is SVM? • Support vector machines, a machine learning method,
learning by examples, statistical learning, classify objects into one of the two classes.
Advantages of SVM: • Diversity of class members (no racial discrimination). • Use of sequence-derived physico-chemical features as
basis for classification. • Suitable for functional classification of novel proteins
(distantly-related proteins, homologous proteins of different functions).
SVM for Classification of ProteinsSVM for Classification of ProteinsHow to represent a protein?
• Each sequence represented by specific feature vector assembled from encoded representations of tabulated residue properties:– amino acid composition– Hydrophobicity– normalized Van der Waals volume– polarity,– Polarizability– Charge– surface tension– secondary structure– solvent accessibility
• Three descriptors, composition (C), transition (T), and distribution (D), are used to describe global composition of each of these properties.
Nucleic Acids Res., 31: 3692-3697
2121
SVM for Classification of ProteinsSVM for Classification of ProteinsHow to represent a protein?
2222
SVM for Classification of ProteinsSVM for Classification of ProteinsHow to represent a protein?
Protein function prediction software SVMProtProtein function prediction software SVMProtUseful for functional prediction of novel proteins, distantly-related proteins, homologous proteins of different functions
Your protein sequence
Computer loaded Computer loaded with SVMProtwith SVMProt
Support vector machinesSupport vector machinesclassifier for every classifier for every
protein functional familyprotein functional family