Top Banner
LSM3241: Bioinformatics and LSM3241: Bioinformatics and Biocomputing Biocomputing Lecture 3: Machine learning Lecture 3: Machine learning method for protein function method for protein function prediction prediction Prof. Chen Yu Zong Prof. Chen Yu Zong Tel: 6516-6877 Tel: 6516-6877 Email: Email: [email protected] [email protected] http:// http:// bidd.nus.edu.sg bidd.nus.edu.sg Room 07-24, level 7, SOC1, Room 07-24, level 7, SOC1, National University of Singapore National University of Singapore
27

LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: [email protected].

Jan 12, 2016

Download

Documents

Anissa Miller
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg.

LSM3241: Bioinformatics and BiocomputingLSM3241: Bioinformatics and Biocomputing

Lecture 3: Machine learning method for Lecture 3: Machine learning method for protein function predictionprotein function prediction

Prof. Chen Yu ZongProf. Chen Yu Zong

Tel: 6516-6877Tel: 6516-6877Email: Email: [email protected]@nus.edu.sg

http://http://bidd.nus.edu.sgbidd.nus.edu.sgRoom 07-24, level 7, SOC1, Room 07-24, level 7, SOC1,

National University of Singapore National University of Singapore

Page 2: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg.

22

Protein Function and Functional FamilyProtein Function and Functional FamilyProteins of similar functional characteristics can be grouped into a family

Page 3: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg.

33

Protein Function and Functional FamilyProtein Function and Functional FamilyProteins of similar functional characteristics can be grouped into a family

Page 4: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg.

44

Protein Function and Functional FamilyProtein Function and Functional FamilyProteins of similar functional characteristics can be grouped into a family

Page 5: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg.

55

Functional Classification of Proteins by SVMFunctional Classification of Proteins by SVM

• A protein is classified as either belong (+) or not belong (-) to a functional family

• By screening against all families, the function of this protein can be identified (example: SVMProt)

Protein

Family-1 SVM

Family-2 SVM

Family-3 SVM

Protein belongs toFamily-3

-

-

+

--

Page 6: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg.

66

Functional Classification of Proteins by SVMFunctional Classification of Proteins by SVM

What is SVM? • Support vector machines, a machine learning method,

learning by examples, statistical learning, classify objects into one of the two classes.

Advantages of SVM: • Diversity of class members (no racial discrimination). • Use of sequence-derived physico-chemical features as

basis for classification. • Suitable for functional classification of novel proteins

(distantly-related proteins, homologous proteins of different functions).

Page 7: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg.

77

Machine Learning MethodMachine Learning Method Inductive learning:

Example-based learning

Descriptor

Positive examples

Negative examples

Page 8: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg.

88

Machine Learning MethodMachine Learning Method

A=(1, 1, 1)B=(0, 1, 1)C=(1, 1, 1)D=(0, 1, 1)E=(0, 0, 0)F=(1, 0, 1)

Feature vectors: Descriptor

Feature vector

Positive examples

Negative examples

Page 9: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg.

99

SVM MethodSVM Method Feature vectors in input space:

A=(1, 1, 1)B=(0, 1, 1)C=(1, 1, 1)D=(0, 1, 1)E=(0, 0, 0)F=(1, 0, 1)

Z

Input space

X

Y

BAE

F

Feature vector

Page 10: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg.

1010

SVM MethodSVM Method

BorderNew border

Project to a higher dimensional space

Protein familymembers

Nonmembers

Protein familymembers

Nonmembers

Page 11: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg.

1111

SVM methodSVM method

Support vector

Support vector

New border

Protein familymembers

Nonmembers

Page 12: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg.

1212

SVM MethodSVM Method

Protein familymembers

Nonmembers

New border

Support vector

Support vector

Page 13: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg.

1313

SVM MethodSVM Method

Border line is nonlinear

Page 14: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg.

1414

SVM methodSVM method

Non-linear transformation: use of kernel function

Page 15: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg.

1515

SVM methodSVM method

Non-linear transformation

Page 16: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg.

1616

SVM MethodSVM Method

Page 17: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg.

1717

SVM MethodSVM Method

Page 18: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg.

1818

SVM MethodSVM Method

Page 19: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg.

1919

SVM MethodSVM Method

Page 20: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg.

2020

SVM for Classification of ProteinsSVM for Classification of ProteinsHow to represent a protein?

• Each sequence represented by specific feature vector assembled from encoded representations of tabulated residue properties:– amino acid composition– Hydrophobicity– normalized Van der Waals volume– polarity,– Polarizability– Charge– surface tension– secondary structure– solvent accessibility

• Three descriptors, composition (C), transition (T), and distribution (D), are used to describe global composition of each of these properties.

Nucleic Acids Res., 31: 3692-3697

Page 21: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg.

2121

SVM for Classification of ProteinsSVM for Classification of ProteinsHow to represent a protein?

Page 22: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg.

2222

SVM for Classification of ProteinsSVM for Classification of ProteinsHow to represent a protein?

From protein sequence:

To Feature vector :

(C_amino acid composition, T_ amino acid composition, D_ amino acid composition, C_hydrophobicity, T_hydrophobicity, D_hydrophobicity, … )

Nucleic Acids Res., 31: 3692-3697

Page 23: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg.

Protein function prediction software SVMProtProtein function prediction software SVMProtUseful for functional prediction of novel proteins, distantly-related proteins, homologous proteins of different functions

Your protein sequence

Computer loaded Computer loaded with SVMProtwith SVMProt

Support vector machinesSupport vector machinesclassifier for every classifier for every

protein functional familyprotein functional family

Identified Identified Functional familiesFunctional families

Protein functionalProtein functionalindicationsindications

Send sequence to classifierSend sequence to classifier

Nucl. Acids Res. 31, 3692-3697 (2003)

Input sequencethrough internet

Option 2Option 1

Input sequenceon local machine

http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi

Your protein sequence

Which functional Which functional families your protein families your protein

belong to?belong to?

Page 24: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg.

Protein function prediction software SVMProtProtein function prediction software SVMProt

Useful for functional prediction of novel proteins, distantly-related proteins, homologous proteins of different functions.

Protein families covered:

46 enzyme families, 3 receptor families, 4 transporter and channel families, 6 DNA- and RNA-binding families, 8 structural families, 2 regulator/factor families.

SVMProt web-version at:http://jing.cz3.nus.edu.sg/cgi-bin/svmprot.cgi

Nucl. Acids Res. 31, 3692-3697 (2003)

Page 25: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg.

Protein function prediction software SVMProtProtein function prediction software SVMProt

Nucl. Acids Res. 31, 3692-3697 (2003)

Check covered protein families here

Input sequence here

Check format here

Page 26: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg.

Protein function prediction software SVMProtProtein function prediction software SVMProt

Nucl. Acids Res. 31, 3692-3697 (2003)

Probability of correct prediction

Prediction score

Page 27: LSM3241: Bioinformatics and Biocomputing Lecture 3: Machine learning method for protein function prediction Prof. Chen Yu Zong Tel: 6516-6877 Email: csccyz@nus.edu.sg.

2727

Summary of Today’s lectureSummary of Today’s lecture

• Machine learning method for protein function prediction.

• Use of SVMProt for probing protein function