Top Banner
CISC667, F05, Lec23, Liao 1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications
23

CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.

Jan 15, 2016

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.

CISC667, F05, Lec23, Liao 1

CISC 667 Intro to Bioinformatics(Fall 2005)

Support Vector Machines (II)

Bioinformatics Applications

Page 2: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.

CISC667, F05, Lec23, Liao 2

Page 3: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.

CISC667, F05, Lec23, Liao 3

Page 4: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.

CISC667, F05, Lec23, Liao 4

Page 5: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.

CISC667, F05, Lec23, Liao 5

Combining pairwise similarity with SVMs for protein homology detection

Protein homologs

Protein non-homologs

Positivepairwise score

vectors

Negativepairwise score

vectors

Support vector machine

Binary classification

Target protein of unknown function

1

23

Positive train Negative train

Testing data

Page 6: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.

CISC667, F05, Lec23, Liao 6

Experiment: known protein families

Jaakkola, Diekhans and Haussler 1999

Page 7: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.

CISC667, F05, Lec23, Liao 7

Vectorization

Page 8: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.

CISC667, F05, Lec23, Liao 8

Page 9: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.

CISC667, F05, Lec23, Liao 9

A measure of sensitivity and specificity

ROC = 1

ROC = 0

ROC = 0.67

6

5

ROC: receiver operating characteristic score is the normalized area

under a curve the plots true positives as a function of false positives

Page 10: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.

CISC667, F05, Lec23, Liao 10

Performance Comparison (1)

Page 11: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.

CISC667, F05, Lec23, Liao 11

Page 12: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.

CISC667, F05, Lec23, Liao 12

Using Phylogenetic Profiles & SVMs YAL001C

E-value Phylogenetic profile

0.122 1

1.064 0

3.589 0

0.008 1

0.692 1

8.49 0

14.79 0

0.584 1

1.567 0

0.324 1

0.002 1

3.456 0

2.135 0

0.142 1

0.001 1

0.112 1

1.274 0

0.234 1

4.562 0

3.934 0

0.489 1

0.002 1

2.421 0

0.112 1

Page 13: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.

CISC667, F05, Lec23, Liao 13

phylogenetic profiles and Evolution Patterns1

1

1 1

10

0

1 1 0 1 0 0 0 1 1 0x

Impossible to know for sure if the gene followed exactly this

evolution pattern

Page 14: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.

CISC667, F05, Lec23, Liao 14

Tree Kernel (Vert, 2002) For a phylogenetic profile x and an evolution pattern e:• P(e) quantifies how “natural” the pattern is

• P(x|e) quantifies how likely the pattern e is the “true history” of the profile x

Tree Kernel :

K tree(x,y) = Σe p(e)p(x|e)p(y|e) Can be proved to be a kernel Intuition: two profiles get closer in the feature space when

they have shared common evolution patterns with high probability.

Page 15: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.

CISC667, F05, Lec23, Liao 15

1 1 0 1 0 0 0 1 1

10.33

0.67

0.34

0.5

0.75

0.55

1 0.33 0.67 0.34 0.5 0.75 0.55

Post-order traversal

Tree-Encoded Profile (Narra & Liao, 2004)

Page 16: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.

CISC667, F05, Lec23, Liao 16

Page 17: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.

CISC667, F05, Lec23, Liao 17

Using Support Vector Machines

Page 18: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.

CISC667, F05, Lec23, Liao 18

Kernel function:

where r = 0.10

Soft margin regularization C = 1.50

Coding scheme: BIN21

L() = i ½ i j yi yj (K(xi · xj) + ij /C)

Evaluation:

Q3 = (P1+P2+P3)/N

C = (TPTN - FP FN) / ( PP PN AP AN)

SOV: segment overlap accuracy

Page 19: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.

CISC667, F05, Lec23, Liao 19

Design tertiary classifiers

Page 20: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.

CISC667, F05, Lec23, Liao 20

Page 21: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.

CISC667, F05, Lec23, Liao 21

Nguyen & Rajapakse, Genome Informatics 14: 218-227 (2003)

Page 22: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.

CISC667, F05, Lec23, Liao 22

A two-stage SVM

Page 23: CISC667, F05, Lec23, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines (II) Bioinformatics Applications.

CISC667, F05, Lec23, Liao 23