Top Banner
Support Vector Machines
46

Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Sep 19, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Support Vector Machines

Page 2: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Supervised Learning

quail apple apple corn cornLabeled

Data

Model Class

Consider classifiers of the form y = f(x;w)

Features1.1 -0.5 0 0 0.3 …

quail-1 0 1.2 -0.4 0.1 …

apple1.1 -0.5 0 0 0.3 … -1 0 1.2 -0.4 0.1 …

apple corn

Learning Find that works well on the training data ww

x

y

Optimization!

Page 3: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Linear Classifiers• A simple and effective family of classifiers

(xi, yi)

y = sign [w · x+ b]

• The training problem:

• Given a set of n training points

• Find for the “best fitting” classifier w, b

Page 4: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Training Linear Classifiersy = sign [w · x+ b]

• How do we find it?

• If there exists a classifier with zero training error we can find one with the __________ algorithmperceptron

Page 5: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Many Possible Solutions • If there exists one solution, there exist many.

• Which one should we choose?

• Intuitively: one that’s farther away from the points.

Page 6: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Maximum Margin Classifier• For every point denote its distance

from the hyperplane.

• Margin of a classifier: the shortest distance to the hyperplane:

• Goal: find classifier that maximizes

xi d(xi,w, b)

mini

d(xi,w, b)

mini

d(xi,w, b)

Page 7: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

ML and Optimization• We have an optimization problem

• Namely we want to find a set of parameters that will maximize some objective function (the margin) subject to some constraints (classifying correctly)

• Need a toolbox for solving such problems

• In what follows we provide an overview

Page 8: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Unconstrained Optimization• Use w to denote optimization variables

• For example:

• Generally:

• Solve by:

• Find all w such that

• These stationary points are the candidates for the global minimum (and asymptotes)

minw1,w2

(w1 � 2w2)2

minw

f(w)

@f(w)

@w= 0

Page 9: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Constrained Minimization• Suppose we are only interested in variables that

satisfy

• The optimization problem is:

• The zero gradient point may not satisfy the constraint.

h(w) = 0

min f(w)s.t. h(w) = 0

w1

w2h(w) = 0

Page 10: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Directional Derivative• Given function f(w) and direction v.

• What happens if we make a small change in direction v (where |v|2=1?)

• It is

• The direction along the curve h(w)=0 has zero directional derivative.

• Thus the orthogonal is the gradientw1

w2h(w) = 0

rf(w) · v

rh(w)

w1

w2 v

w

f(w + ↵v)

rf(w)

Page 11: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Constrained Minimization• The optimization problem is:

• Consider f along h

• Vector of movement along curve v is orthogonal to

min f(w)s.t. h(w) = 0

w1

w2h(w) = 0

rf(w) = �rh(w)

rh(w)

f(w)

w : h(w) = 0

rh(w)

• Gradient along curve

• Is zero iff

rf(w) · v

vrf(w)

Page 12: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Lagrange Multiplier• The optimum points should satisfy:

rf(w) = �rh(w) For some �h(w) = 0 Constraint satisfied

1.2.

• Alternative formulation. Define Lagrangian:

L(w,�) = f(w) + �h(w)

• The optimum should satisfy:rwL(w,�) = 0

r�L(w,�) = 0

1.2.

Page 13: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Example• What is the distance between hyperplane and point?

w · x+ b = 0

minx:w·x+b=0

0.5kx� x̄k22

• Use primal feasibility to solve for �

(x̄� �w) ·w + b = 0

L(x,�) = 0.5kx� x̄k22 + � (w · x+ b)

rx

L(x,�) = (x� x̄) + �w = 0

� =x̄ ·w + b

kwk

kx� x̄k2 =|x̄ ·w + b|

kwk

x = x̄� �w

Page 14: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Multiple Constraints• Solve:

min f(w)s.t. hi(w) = 0 8i = 1, . . . , p

• Introduce multiplier per constraint �1, · · · ,�p

• Lagrangian:

• Optimality conditions:rwL(w,�) = 01.r�iL(w,�) = 02.

• May be several such points. Need to check which one is the global optimum.

L(w,�,↵) = f(w) +X

i

�ihi(w)

Page 15: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Inequality Constraints• Solve:

• Optimality conditions:

1.

w1

w2h(w) 0min f(w)

s.t. h(w) 0rh(w)

h(w) 0 Constraint satisfied

w1

w2h(w) 0

rh(w)�rf(w)

• When we are “stuck” if the only directions that decrease f take us outside the constraints. Namely:

h(w) = 0

For some ↵ � 0rf(w) = �↵rh(w)2a.h(w) < 0• When we need:

rf(w) = 02b.

w1

w2h(w) 0

rh(w)�rf(w)

Progress possible

Stuck

Page 16: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Complementary Slackness

• Called the Karush Kuhn Tucker (KKT) conditions. Always necessary.

• Sufficient for convex optimization.

rf(w) = �↵rh(w)

h(w) < 0 rf(w) = 0

h(w) = 0

• Summarize as:rf(w) = �↵rh(w)

↵h(w) = 0

↵ � 0, h(w) 0

Page 17: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Lagrange Multipliers• Consider the general problem:

min f(w)s.t. hi(w) = 0 8i = 1, . . . , p

gi(w) 0 8i = 1, . . . ,m

• Define the Lagrangian:L(w,�,↵) = f(w) +

X

i

�ihi(w) +X

i

↵igi(w)

• Optimum must satisfy:rwL(w,�,↵) = 0

↵igi(x) = 0 8i↵i � 0, gi(w) 0, hi(w) = 0

Typically easy if someone hands us ! ↵,�

Page 18: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Convex Optimization• General optimization problem may have

many local minima/maxima and saddle points. w

f(w)

• Makes minimization hard (e.g., exponential in dimension).

• Convex optimization problems are a “nice” subclass. Require:

• Convex f(w),g(x)

• Linear h(x) w

f(w)

Page 19: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Convex Optimization• Convex function if:

• Value on line is less than linear function.

• Non-negative second derivative (or Hessian)

• Examples:f(w) = w · xf(w) = max [w · x, 0]

f(w) = wTAw A ⌫ 0

Page 20: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Convex Optimization• Nice things:

• No local optima

• KKT conditions are sufficient for global optimality.

• Multipliers can be solved via dual.

Page 21: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Convex Duality• For every convex problem, we can define a dual

problem that has the same value

• Optimization is over the Lagrange multipliers.

• Solution to dual implies solution to primal via KKT

• Dual might be easier to solve.

Page 22: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Convex Duality• Recall the Lagrangian:

L(w,�,↵) = f(w) +X

i

�ihi(w) +X

i

↵igi(w)

• Then:min f(w)s.t. hi(w) = 0

gi(w) 0

= min

wmax

�,↵�0L(w,�,↵) Why?

min

wmax

�,↵�0L(w,�,↵) max

�,↵�0min

wL(w,�,↵)

• Replacing min and max gives:

• In the convex case it is an equality

Page 23: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Convex Duality• Define:

• Dual problem:

g(�,↵) = minw

L(w,�,↵)

max

�,↵�0g(�,↵)

• Has same value as primal problem.

• The resulting are optimal. You can recover the “primal” variables w via KKT. This is often easy.

�,↵

rwL(w,�,↵) = 0

↵igi(x) = 0 8i↵i � 0, gi(w) 0, hi(w) = 0

Page 24: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Maximum Margin Classifier• For every point denote its distance

from the hyperplane.

• Margin of a classifier: the shortest distance to the hyperplane:

• Goal: find classifier that maximizes

xi d(xi,w, b)

mini

d(xi,w, b)

mini

d(xi,w, b)

Page 25: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Geometry of Linear Classifiers

y = sign [w · x+ b]

• w is the orthogonal direction to the hyperplane.

• Proof: if on hyperplane then

What'is'b?

wMx + ; = 0

;w

w x

|x ·w + b|kwk

• Distance from origin to hyperplane is |b|kwk

x1,x2 w · (x1 � x2) = 0

Page 26: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Max Margin Hyperplane• Find a hyperplane that maximizes the minimum distance

• Solve: maxw1

kwk mini |w · xi + b|s.t. yi (w · xi + b) � 0

• Any solution can be rescaled to and not affect the objective or constraints.

(w, b)(cw, cb)

• We can rescale such that mini

|w · xi + b| = 1

maxw kwk�1

s.t. yi (w · xi + b) � 0

mini |w · xi + b| = 1

Page 27: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Max Margin Hyperplanemaxw kwk�1

s.t. yi (w · xi + b) � 0

mini |w · xi + b| = 1

• Equivalently:maxw kwk�1

s.t. mini yi (w · xi + b) = 1

• We can relax to an inequality (why?): mini

yi (w · xi + b) � 1

maxw kwk�1

s.t. yi (w · xi + b) � 1

minw kwk2s.t. yi (w · xi + b) � 1

Page 28: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Support Vector Machines (SVM)• The SVM classifier is the solution to:

• The where this is an equality are “support vectors” xi

• It is a convex optimization problem. Called a convex quadratic program (quad. objective and linear constraints)

minw 0.5kwk2s.t. yi (w · xi + b) � 1

Factor 0.5 doesn’t affect the optimum

Page 29: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

SVM History• Initial version by Vapnik and Chervonenkis (63)

• Non linear version by Boser, Guyon, Vapnik (92)

• Much work on generalization theory since (by Bartlett, Shawe Taylor, Mendelson, Schoelkopf, Smola, and others).

• Many variants for regression, unsupervised learning etc.

Page 30: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Solving SVM• The SVM classifier is the solution to:

• You can plug this into a solver and get w, b

• Lets use Lagrangian to understand solution.

minw 0.5kwk2s.t. yi (w · xi + b) � 1

w =X

i

↵iyixi

rbL(w, b,↵) = �X

i

↵iyi = 0X

i

↵iyi = 0

L(w, b,↵) = 0.5kwk2 +X

i

↵i [1� yi (w · xi + b)]

rwL(w, b,↵) = w �X

i

↵iyixi = 0

Page 31: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

The Representer Theorem • The optimal weight is a weight

combination of the data pointsw =

X

i

↵iyixi

• This will be very important!

• When is ? (recall KKT) ↵i = 0

• Whenever w · xi + b < 1

• only when ↵i > 0w · xi + b = 1

• Optimal weight is a combination only of support vectors!

Page 32: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Deriving via Dual • How do we find and then b?

• Use the dual!

w =X

i

↵iyixi↵i

• We know the minimizing w. Plug into Lagrangian.

g(↵) = 0.5kX

i

↵iyixik22 �X

i

↵i

2

4yi

0

@

0

@X

j

↵jyjxj

1

A · xi + b

1

A� 1

3

5

X

i

↵iyi = 0

=X

i

↵i � 0.5X

i,j

↵i↵jyiyjxi · xj

• Constrain because otherwise g(↵) = �1

L(w, b,↵) = 0.5kwk2 +X

i

↵i [1� yi (w · xi + b)]

g(↵) = minw,b

L(w, b,↵)

Page 33: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

The SVM Dual• The dual problem is:

max

Pi ↵i � 0.5

Pi,j ↵i↵jyiyjxi · xj

s.t. ↵i � 0,P

i ↵iyi = 0

• Number of variables and constraints is number of training points.

• Also a convex quadratic program (why?)

• Obtaining the primal w: w =X

i

↵iyixi

Page 34: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Finding b• Recall from KKT that support vectors ( ) satisfy:

w · xi + b = 1

↵i > 0

• Since we know w we can solve for b.

• Should give same value for all support vectors.

Page 35: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Non Separable Case• So far we assumed a separating

hyperplane exists.• If it doesn’t, our optimization problem

is infeasible. • For real data, we don’t want to make

this assumption. Because:

• Data may be noisy. Linear classifier may still do ok.

• May come from a non linear rule. Next class!

Page 36: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Non Separable Case• Ideally, we would like to find the

classifier that minimizes training error.• But:

• Turns out this is NP hard.

• How do we incorporate margin?

• Let’s start from the separable case.

Page 37: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Non Separable Case

• Separable case:

• Need to “relax” the constraints.

• Allow violation by , but “pay” for violation.

minw 0.5kwk2s.t. yi (w · xi + b) � 1

⇠i � 0

• C is a constant that determines how much we care about classification errors as opposed to margin.

minw 0.5kwk2 + CP

i ⇠is.t. yi (w · xi + b) � 1� ⇠i , ⇠i � 0

Page 38: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Dual for non separable

• Dual is:max

Pi ↵i � 0.5

Pi,j ↵i↵jyiyjxi · xj

s.t. 0 ↵i C,P

i ↵iyi = 0

• Mapping to primal is as before.

Page 39: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Alternative Interpretation

• Primal is: minw 0.5kwk2 + CP

i ⇠is.t. yi (w · xi + b) � 1� ⇠i , ⇠i � 0

• Can solve for to get: ⇠i ⇠i = max [0, 1� yi(w · xi + b)]

• Problem becomes:

min

wCX

i

max [0, 1� yi(w · xi + b)] + 0.5kwk22

Page 40: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Alternative Interpretation

• Primal is: min

wCX

i

max [0, 1� yiw · xi] + 0.5kwk22

• The function: is called the hinge loss.

max [0, 1� yiw · xi]Hinge'loss

• SVM'uses'the'hinge'loss'max(0,1+ − %#1 x# )• an'approximation'to'the'0]1'loss

%#1 x#0

1

2

3

4

5

_3 _2 _1 0 1 2 3 4

0_1hinge

yiw · xi

• Upper bounds the true classification error

• A convex upper bound!

Page 41: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Alternative Interpretation

• Primal is: min

wCX

i

max [0, 1� yiw · xi] + 0.5kwk22

Bound on loss Regularization• Very common design pattern.

• Other losses and regularizers can be considered.

• Logistic loss:

• L1 regularization: . Sparsity inducing.

1

ln 2ln(1 + e�yiw·xi)

kwk1 =X

i

|wi|

Page 42: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

SVM and Generalization

• Intuitively choosing a large margin should improve generalization

• Assume true distribution and classifier are such that the margin is

• Expect generalization to behave like

• But can always increase by rescaling

• Denote R the largest norm of x

• Generalization scales with

���1

R��1

Page 43: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

SVM and Generalization

• Assume training error is zero.

• Can be shown that generalization satisfies (up to some logarithmic factors):

�error(w) c1

m

R

2

2+

c2

m

log

m

• The VC dimension is replaced by

• Appeared in “Structural Risk Minimization over Data-Dependent Hierarchies” (98)

R2

�2

Page 44: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Leave one out bounds

• Another intuition: using few support vectors should lead to good generalization.

• We will show this via leave one out error.

• Denote training sample without

• Denote the hypothesis from training on S

S�i (xi, yi)

hS

R̂LOO(S) =1

m

mX

i=1

I [hS�i(xi) 6= yi]

Page 45: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Leave one out bounds

• LOO error is similar in spirit to generalization error. But we only train on m-1 points.

• Denote R(h) the generalization error of h

• Can show:

• LOO error and generalization error have same expected value

ESm

hR̂LOO(Sm)

i= ESm�1

⇥R(hSm�1)

R(h) = E(x,y)⇠D

I [h(x) 6= y]

Page 46: Support Vector Machinesml-intro-2016.wdfiles.com/.../tau_ml16_svm_lect1.pdf · 2016. 11. 27. · Supervised Learning quail apple apple corn corn Labeled Data Model Class Consider

Leave one out bounds for SVM

• What is the expected LOO error of SVM (separable case).

• If a non-support vector is left out, the solution will not change, and error will be zero.

• Otherwise there might be an error:R̂LOO(Sm) NSV (Sm)

m

ESm�1

⇥R(hSm�1)

⇤ 1

mESm [NSV (Sm)]• Therefore:

• Generalization related to number of SVs.