Top Banner
Classification by Support Vector Machines Florian Markowetz Max-Planck-Institute for Molecular Genetics – Computational Molecular Biology – Berlin Practical DNA Microarray Analysis 2003 1
31

Classi cation by Support Vector Machinescompdiag.molgen.mpg.de/ngfn/docs/2003/nov/svm.pdf · A Support Vector Machine is a maximal margin hyperplane in feature space built by using

May 08, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Classi cation by Support Vector Machinescompdiag.molgen.mpg.de/ngfn/docs/2003/nov/svm.pdf · A Support Vector Machine is a maximal margin hyperplane in feature space built by using

Classification by Support Vector Machines

Florian MarkowetzMax-Planck-Institute for Molecular Genetics

– Computational Molecular Biology –Berlin

Practical DNA Microarray Analysis 2003

1

Page 2: Classi cation by Support Vector Machinescompdiag.molgen.mpg.de/ngfn/docs/2003/nov/svm.pdf · A Support Vector Machine is a maximal margin hyperplane in feature space built by using

Overview

I Large Margin Classifiers

II The Kernel Trick

III Todays practical session

Florian Markowetz, Classification by SVM, Practical DNA Microarray Analysis 2003 2

Page 3: Classi cation by Support Vector Machinescompdiag.molgen.mpg.de/ngfn/docs/2003/nov/svm.pdf · A Support Vector Machine is a maximal margin hyperplane in feature space built by using

Supervised Learning

Calvin, I’m still confusedabout cats and dogs!

OK, then I will explain itonce more ...

Florian Markowetz, Classification by SVM, Practical DNA Microarray Analysis 2003 3

Page 4: Classi cation by Support Vector Machinescompdiag.molgen.mpg.de/ngfn/docs/2003/nov/svm.pdf · A Support Vector Machine is a maximal margin hyperplane in feature space built by using

Unsupervised Learning

Calvin, I’m still confusedabout cats and dogs!

Yeah, me too!

Florian Markowetz, Classification by SVM, Practical DNA Microarray Analysis 2003 4

Page 5: Classi cation by Support Vector Machinescompdiag.molgen.mpg.de/ngfn/docs/2003/nov/svm.pdf · A Support Vector Machine is a maximal margin hyperplane in feature space built by using

Supervised Learning

Training set: a number of expression profiles with known labels which representthe true population.

Difference to clustering: there you don’t know the labels,you have to find a structure on your own.

Learning/Training: find a decision rule which explains the training set well.

This is the easy part, because we know the labels of thetraining set!

Generalisation ability: how does the decision rule learned from the training setgeneralize to new specimen?

Goal: find a decision rule with high generalisation ability.

Florian Markowetz, Classification by SVM, Practical DNA Microarray Analysis 2003 5

Page 6: Classi cation by Support Vector Machinescompdiag.molgen.mpg.de/ngfn/docs/2003/nov/svm.pdf · A Support Vector Machine is a maximal margin hyperplane in feature space built by using

Underfitting and Overfitting

?

? ?

?

too complextoo simple

tradeoff

new patient

negative example

positive example

Florian Markowetz, Classification by SVM, Practical DNA Microarray Analysis 2003 6

Page 7: Classi cation by Support Vector Machinescompdiag.molgen.mpg.de/ngfn/docs/2003/nov/svm.pdf · A Support Vector Machine is a maximal margin hyperplane in feature space built by using

Linear separation of the training set

We start with linear separation and add complexityin a second step by using kernel functions.

A separating hyperplane is defined by- the normal vector w and- the offset b:

hyperplane = { x | 〈w, x〉+ b = 0}

〈·, ·〉 is called inner product, scalar product or dotproduct.

Training: Choose w and b from the labeled ex-amples in the training set.

W

hyperplane: <

w,x> + b = 0

Florian Markowetz, Classification by SVM, Practical DNA Microarray Analysis 2003 7

Page 8: Classi cation by Support Vector Machinescompdiag.molgen.mpg.de/ngfn/docs/2003/nov/svm.pdf · A Support Vector Machine is a maximal margin hyperplane in feature space built by using

Predict the label of a new point

Prediction: On which side of the hyperplanedoes the new point lie?

Points in the direction of the normal vectorare classified as POSITIVE.

Points in the opposite direction are classifiedas NEGATIVE.

W

trained h

yperplane

??

??? ?

??

?? ?

?

Florian Markowetz, Classification by SVM, Practical DNA Microarray Analysis 2003 8

Page 9: Classi cation by Support Vector Machinescompdiag.molgen.mpg.de/ngfn/docs/2003/nov/svm.pdf · A Support Vector Machine is a maximal margin hyperplane in feature space built by using

Which hyperplane is the best?

C D

A B

Florian Markowetz, Classification by SVM, Practical DNA Microarray Analysis 2003 9

Page 10: Classi cation by Support Vector Machinescompdiag.molgen.mpg.de/ngfn/docs/2003/nov/svm.pdf · A Support Vector Machine is a maximal margin hyperplane in feature space built by using

No sharp knive, but a fat plane

Samples

Sampleswith negative

label

with positivelabel

FAT PLANE

Florian Markowetz, Classification by SVM, Practical DNA Microarray Analysis 2003 10

Page 11: Classi cation by Support Vector Machinescompdiag.molgen.mpg.de/ngfn/docs/2003/nov/svm.pdf · A Support Vector Machine is a maximal margin hyperplane in feature space built by using

Separate the training set with maximal margin

Separating

Hyperplane

MarginSamples

Sampleswith negative

label

with positivelabel

Florian Markowetz, Classification by SVM, Practical DNA Microarray Analysis 2003 11

Page 12: Classi cation by Support Vector Machinescompdiag.molgen.mpg.de/ngfn/docs/2003/nov/svm.pdf · A Support Vector Machine is a maximal margin hyperplane in feature space built by using

What are Support Vectors?

The points nearest to the separating hy-perplane are called Support Vectors.

Only they determine the position of thehyperplane. All other points have noinfluence!

Mathematically: the weighted sum of theSupport Vectors is the normal vector ofthe hyperplane. Separatin

g

Hyperplane

MarginSamples

Sampleswith negative

label

with positivelabel

Florian Markowetz, Classification by SVM, Practical DNA Microarray Analysis 2003 12

Page 13: Classi cation by Support Vector Machinescompdiag.molgen.mpg.de/ngfn/docs/2003/nov/svm.pdf · A Support Vector Machine is a maximal margin hyperplane in feature space built by using

Non-separable training sets

Use linear separation, but admit training errors.

Separating

Hyperplane

Penalty of error: distance to hyperplane multiplied by error cost C.

Florian Markowetz, Classification by SVM, Practical DNA Microarray Analysis 2003 13

Page 14: Classi cation by Support Vector Machinescompdiag.molgen.mpg.de/ngfn/docs/2003/nov/svm.pdf · A Support Vector Machine is a maximal margin hyperplane in feature space built by using

What’s next?

I Large Margin Classifiers

II The Kernel Trick

III Todays practical session

Florian Markowetz, Classification by SVM, Practical DNA Microarray Analysis 2003 14

Page 15: Classi cation by Support Vector Machinescompdiag.molgen.mpg.de/ngfn/docs/2003/nov/svm.pdf · A Support Vector Machine is a maximal margin hyperplane in feature space built by using

Separation may be easier in higher dimensions

featuremap

separatinghyperplane

complex in low dimensions simple in higher dimensions

Florian Markowetz, Classification by SVM, Practical DNA Microarray Analysis 2003 15

Page 16: Classi cation by Support Vector Machinescompdiag.molgen.mpg.de/ngfn/docs/2003/nov/svm.pdf · A Support Vector Machine is a maximal margin hyperplane in feature space built by using

The kernel trick

Maximal margin hyperplanes in feature space

If classification is easier in a high-dimenisonal feature space, we would like to builda maximal margin hyperplane there.

The construction depends on inner products ⇒ we will have to evaluate innerproducts in the feature space.

This can be computationally intractable, if the dimensions become too large!

Loophole

Use a kernel function that lives in low dimensions, but behaves like an inner productin high dimensions.

Florian Markowetz, Classification by SVM, Practical DNA Microarray Analysis 2003 16

Page 17: Classi cation by Support Vector Machinescompdiag.molgen.mpg.de/ngfn/docs/2003/nov/svm.pdf · A Support Vector Machine is a maximal margin hyperplane in feature space built by using

Kernel functions

Expression profiles p = (p1, p2, . . . , pg) ∈ Rgand q = (q1, q2, . . . , qg) ∈ Rg.

Similarity in gene space: INNER PRODUCT

〈p, q〉 = p1q1 + p2q2 + . . .+ pgqg

Similarity in feature space: KERNEL FUNCTION

K(p, q) = polynomial, radial basis, ...

Florian Markowetz, Classification by SVM, Practical DNA Microarray Analysis 2003 17

Page 18: Classi cation by Support Vector Machinescompdiag.molgen.mpg.de/ngfn/docs/2003/nov/svm.pdf · A Support Vector Machine is a maximal margin hyperplane in feature space built by using

Examples of Kernels

linear K(p, q) = 〈p, q〉

polynomial K(p, q) = (γ〈p, q〉+ c0)d

radial basis function K(p, q) = exp(−γ‖p− q‖2

)

Florian Markowetz, Classification by SVM, Practical DNA Microarray Analysis 2003 18

Page 19: Classi cation by Support Vector Machinescompdiag.molgen.mpg.de/ngfn/docs/2003/nov/svm.pdf · A Support Vector Machine is a maximal margin hyperplane in feature space built by using

Why is it a trick?

We do not need to know, how the feature space really looks like,we just need the kernel function as a measure of similarity.

This is kind of black magic: we do not know what happens inside the kernel, wejust get the output.

Still, we have the geometric interpretation of the maximal margin hyperplane, soSVMs are more transparent than e. g. Artificial Neural Networks.

Florian Markowetz, Classification by SVM, Practical DNA Microarray Analysis 2003 19

Page 20: Classi cation by Support Vector Machinescompdiag.molgen.mpg.de/ngfn/docs/2003/nov/svm.pdf · A Support Vector Machine is a maximal margin hyperplane in feature space built by using

The kernel trick: summary

separatinghyperplane

Non-linear separationbetween vectorsin gene spaceusing kernel functions

=Linear separationbetween vectorsin feature spaceusing inner product

Florian Markowetz, Classification by SVM, Practical DNA Microarray Analysis 2003 20

Page 21: Classi cation by Support Vector Machinescompdiag.molgen.mpg.de/ngfn/docs/2003/nov/svm.pdf · A Support Vector Machine is a maximal margin hyperplane in feature space built by using

Support Vector Machines

A Support Vector Machine is

a maximal margin hyperplane in feature space

built by using a kernel function in gene space.

Florian Markowetz, Classification by SVM, Practical DNA Microarray Analysis 2003 21

Page 22: Classi cation by Support Vector Machinescompdiag.molgen.mpg.de/ngfn/docs/2003/nov/svm.pdf · A Support Vector Machine is a maximal margin hyperplane in feature space built by using

Parameters of SVM

Kernel Parameters γ: width of rbf

coeff. in polynomial ( = 1)

d: degree of polynomial

c0 additive constant in polynomial (= 0)

Error weight C: influence of training errors

Florian Markowetz, Classification by SVM, Practical DNA Microarray Analysis 2003 22

Page 23: Classi cation by Support Vector Machinescompdiag.molgen.mpg.de/ngfn/docs/2003/nov/svm.pdf · A Support Vector Machine is a maximal margin hyperplane in feature space built by using

SVM@work: low complexity

Figure taken from Scholkopf and Smola, Learning with Kernels, MIT Press 2002, p217

Florian Markowetz, Classification by SVM, Practical DNA Microarray Analysis 2003 23

Page 24: Classi cation by Support Vector Machinescompdiag.molgen.mpg.de/ngfn/docs/2003/nov/svm.pdf · A Support Vector Machine is a maximal margin hyperplane in feature space built by using

SVM@work: medium complexity

Figure taken from Scholkopf and Smola, Learning with Kernels, MIT Press 2002, p217

Florian Markowetz, Classification by SVM, Practical DNA Microarray Analysis 2003 24

Page 25: Classi cation by Support Vector Machinescompdiag.molgen.mpg.de/ngfn/docs/2003/nov/svm.pdf · A Support Vector Machine is a maximal margin hyperplane in feature space built by using

SVM@work: high complexity

Figure taken from Scholkopf and Smola, Learning with Kernels, MIT Press 2002, p217

Florian Markowetz, Classification by SVM, Practical DNA Microarray Analysis 2003 25

Page 26: Classi cation by Support Vector Machinescompdiag.molgen.mpg.de/ngfn/docs/2003/nov/svm.pdf · A Support Vector Machine is a maximal margin hyperplane in feature space built by using

Literature on SVM

• http://www.kernel-machines.org

• Bernhard Scholkopf and Alex Smola.Learning with Kernels. MIT Press, Cambridge, MA, 2002.An introduction and overview over SVMs. A free sample of one third of the chapters (Introduction,

Kernels, Loss Functions, Optimization, Learning Theory Part I, and Classification) is available

on the book website.

• Vladimir Vapnik.Statistical Learning Theory. Wiley, NY, 1998.The comprehensive treatment of statistical learning theory, including a large amount of material

on SVMs

The Nature of Statistical Learning Theory. Springer, NY, 1995.An overview of statistical learning theory, containing no proofs, but most of the crucial theorems

and milestones of learning theory. With a detailed chapter on SVMs for pattern recognition and

regression

Florian Markowetz, Classification by SVM, Practical DNA Microarray Analysis 2003 26

Page 27: Classi cation by Support Vector Machinescompdiag.molgen.mpg.de/ngfn/docs/2003/nov/svm.pdf · A Support Vector Machine is a maximal margin hyperplane in feature space built by using

What’s next?

I Large Margin Classifiers

II The Kernel Trick

III Todays practical session

Florian Markowetz, Classification by SVM, Practical DNA Microarray Analysis 2003 27

Page 28: Classi cation by Support Vector Machinescompdiag.molgen.mpg.de/ngfn/docs/2003/nov/svm.pdf · A Support Vector Machine is a maximal margin hyperplane in feature space built by using

Practical session on classification

Learn to classify tumor samples

by Support Vector Machines

and Nearest Shrunken Centroids.

Florian Markowetz, Classification by SVM, Practical DNA Microarray Analysis 2003 28

Page 29: Classi cation by Support Vector Machinescompdiag.molgen.mpg.de/ngfn/docs/2003/nov/svm.pdf · A Support Vector Machine is a maximal margin hyperplane in feature space built by using

SVM and PAMR

http://cran.r-project.org/

SVMs are part of the R package e1071 (called after the TU Vienna

statistics department).

You can also download pamr here. See the authors webpage for somemore information http://www-stat.stanford.edu/∼tibs/PAM/

Florian Markowetz, Classification by SVM, Practical DNA Microarray Analysis 2003 29

Page 30: Classi cation by Support Vector Machinescompdiag.molgen.mpg.de/ngfn/docs/2003/nov/svm.pdf · A Support Vector Machine is a maximal margin hyperplane in feature space built by using

Computational Diagnosis

TASK:

For 3 new patients in your hospital, decide which kind of breast cancer

they suffer from (ER+ or ER-) using their expression profiles.

IDEA:

Learn the difference between the cancer types from an archive of 46expression profiles, which were analyzed and classified by an expert.

Florian Markowetz, Classification by SVM, Practical DNA Microarray Analysis 2003 30

Page 31: Classi cation by Support Vector Machinescompdiag.molgen.mpg.de/ngfn/docs/2003/nov/svm.pdf · A Support Vector Machine is a maximal margin hyperplane in feature space built by using

Training ... tuning ... testing

TRAINING:svm.doctor <- svm(data = "46 profiles",

labels = "by an expert",kernel = "..",parameters = "..")

TUNING:

Now tune SVM for good generalization ability (training error, cross

validation error). Select informative genes.

TESTING:svm.diagnosis <- predict(svm.doctor, new.patients)

Florian Markowetz, Classification by SVM, Practical DNA Microarray Analysis 2003 31