Top Banner
Topics in Algorithms 2007 Ramesh Hariharan
23

Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines.

Jan 19, 2018

Download

Documents

Gregory Dorsey

Machine Learning How do learn good separators for 2 classes of points? Seperator could be linear or non-linear Maximize margin of separation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines.

Topics in Algorithms 2007

Ramesh Hariharan

Page 2: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines.

Support Vector Machines

Page 3: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines.

Machine Learning

How do learn good separators for 2 classes of points?

Seperator could be linear or non-linear

Maximize margin of separation

Page 4: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines.

Support Vector Machines Hyperplane w

x

w

|w| = 1 For all x on the hyperplane w.x = |w||x| cos(ø)= |x|cos (ø) = constant = -b w.x+b=0ø

-b

Page 5: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines.

Support Vector Machines Margin of separation

w

|w| = 1x Є Blue: wx+b >= Δx Є Red: wx+b <= -Δ

maximize 2 Δ w,b,Δ

wx+b=0

wx+b=Δ

wx+b=-Δ

Page 6: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines.

Support Vector Machines Eliminate Δ by dividing by Δ

w

|w| = 1x Є Blue: (w/Δ) x + (b/Δ) >= 1x Є Red: (w/Δ) x + (b/Δ) <= -1

w’=w/Δ b’=b/Δ |w’|=|w|/Δ=1/Δ

wx+b=0

wx+b=Δ

wx+b=-Δ

Page 7: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines.

Support Vector Machines Perfect Separation Formulation

w

x Є Blue: w’x+b’ >= 1x Є Red: w’x+b’ <= -1

minimize |w’|/2 w’,b’

minimize (w’.w’)/2 w’,b’

wx+b=0

wx+b=Δ

wx+b=-Δ

Page 8: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines.

Support Vector Machines Formulation allowing for

misclassificationx Є Blue: wx+b >= 1x Є Red: -(wx+b) >= 1

minimize (w.w)/2 w,b

xi Є Blue: wxi + b >= 1-ξixi Є Red: -(wxi + b) >= 1-ξi ξi >= 0

minimize (w.w)/2 + C Σξi w,b,ξi

Page 9: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines.

Support Vector Machines Duality

yi (wxi + b) + ξi >= 1 ξi >= 0 yi=+/-1, class label

minimize (w.w)/2 + C Σξi w,b,ξi

Primal

Σ λi yi = 0 λi >= 0 -λi >= C max Σλi – ( ΣiΣj λiλjyiyj (xi.xj) )/2 λi

Dual

Page 10: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines.

Support Vector Machines Duality (Primal Lagrangian

Dual) If Primal is feasible then Primal=Lagrangian

Primal yi (wxi + b) + ξi >= 1 ξi >= 0 yi=+/-1, class label min (w.w)/2 + C Σξiw,b,ξi

Primal

min maxw,b,ξi λi, αi >=0

(w.w)/2 + C Σξi- Σi λi (yi (wxi + b) + ξi - 1) - Σi αi (ξi - 0)

Lagrangian Primal

=

Page 11: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines.

Support Vector Machines Lagrangian Primal Lagrangian

Dual Langrangian Primal >= Lagrangian Dual

>=

min maxw,b,ξi λi, αi >=0

(w.w)/2+ C Σξi-Σiλi(yi (wxi+b)+ξi-1) -Σiαi(ξi -0)

Lagrangian Primal

max minλi, αi >=0 w,b,ξi

(w.w)/2+ C Σξi-Σiλi(yi (wxi+b)+ξi-1)-Σiαi(ξi -0)

Lagrangian Dual

Page 12: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines.

Support Vector Machines Lagrangian Primal >= Lagrangian DualProof Consider a 2d matrix

Find max in each row Find the smallest of these values

Find min in each column Find the largest of these values

LP

LD

Page 13: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines.

Support Vector Machines Can Lagrangian Primal = Lagrangian Dual ?

ProofConsider w* b* ξ* optimal for primal Find λi, αi>=0 such that minimizing over w,b,ξ gives w* b* ξ* Σiλi(yi (w*xi+b*)+ξi* -1)=0 Σiαi(ξi* -0)=0

max minλi, αi >=0 w,b,ξi

(w.w)/2+ C Σξi-Σiλi(yi (wxi+b)+ξi-1)-Σiαi(ξi -0)

Page 14: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines.

Support Vector Machines Can Lagrangian Primal = Lagrangian Dual ?

ProofConsider w* b* ξi* optimal for primal Find λi, αi >=0 such that Σiλi(yi (w*xi+b*)+ξi* -1)=0 Σiαi(ξi* -0)=0 ξi* > 0 implies αi=0 yi (w*xi+b*)+ξi* -1 !=0 implies λi=0

max minλi, αi >=0 w,b,ξi

(w.w)/2+ C Σξi-Σiλi(yi (wxi+b)+ξi-1)-Σiαi(ξi -0)

Page 15: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines.

Support Vector Machines Can Lagrangian Primal = Lagrangian Dual ?

ProofConsider w* b* ξi* optimal for primal Find λi, αi >=0 such that minimizing over w,b,ξi gives w*,b*, ξi* at w*,b*,ξi* δ/ δwj = 0, δ/ δξi = 0, δ/ δb = 0and second derivatives should be non-neg at all places

max minλi, αi >=0 w,b,ξi

(w.w)/2+ C Σξi-Σiλi(yi (wxi+b)+ξi-1)-Σiαi(ξi -0)

Page 16: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines.

Support Vector Machines Can Lagrangian Primal = Lagrangian Dual ?

ProofConsider w* b* ξi* optimal for primal Find λi, αi >=0 such that minimizing over w,b gives w*,b*

w* - Σiλi yi xi = 0

-Σiλi yi = 0

-λi - αi +C = 0 second derivatives are always non-neg

max minλi, αi >=0 w,b,ξi

(w.w)/2+ C Σξi-Σiλi(yi (wxi+b)+ξi-1)-Σiαi(ξi -0)

Page 17: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines.

Support Vector Machines Can Lagrangian Primal = Lagrangian Dual ?

ProofConsider w* b* ξi* optimal for primalFind λi, αi >=0 such that ξi* > 0 implies αi=0 yi (w*xi+b*)+ξi* -1 !=0 implies λi=0 w* - Σiλi yi xi = 0

-Σiλi yi = 0

- λi - αi + C = 0 Such a λi, αi >=0 always exists!!!!!

max minλi, αi >=0 w,b,ξi

(w.w)/2+ C Σξi-Σiλi(yi (wxi+b)+ξi-1)-Σiαi(ξi -0)

Page 18: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines.

Support Vector Machines Proof that appropriate Lagrange Multipliers always exist?

Roll all primal variables into w lagrange multipliers into λ

min f(w) w Xw >= y

max min f(w) – λ (Xw-y)λ>=0 w

min max f(w) – λ (Xw-y) w λ>=0

Page 19: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines.

Support Vector Machines Proof that appropriate Lagrange Multipliers always exist?

X w*

y >

=0

=>=0λ

λ>=0X=

Grad(f) at w* =

Claim: This is satisfiable

>=

Page 20: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines.

Support Vector Machines Proof that appropriate Lagrange Multipliers always exist?

λ>=0X=

Grad(f) =

Claim: This is satisfiable

Grad(f)

Row vectors of X=

Grad(f)

Page 21: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines.

Support Vector Machines Proof that appropriate Lagrange Multipliers always exist?

λ>=0X=

Grad(f) =

Claim: This is satisfiable

Row vectors of X=

Grad(f)

h

X= h >=0, Grad(f) h < 0w*+h is feasible and f(w*+h)<f(w*) for small enough h

Page 22: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines.

Support Vector Machines Finally the Lagrange Dual

max minλi, αi >=0 w,b,ξi

(w.w)/2+ C Σξi-Σiλi(yi (wxi+b)+ξi-1)-Σiαi(ξi -0)

w - Σiλi yi xi = 0

-Σiλi yi = 0

-λi - αi +C = 0

Rewrite in final dual form

Σ λi yi = 0λi >= 0-λi >= -C max Σλi – ( ΣiΣj λiλjyiyj (xi.xj) )/2 λi

Page 23: Topics in Algorithms 2007 Ramesh Hariharan. Support Vector Machines.

Support Vector Machines Karush-Kuhn-Tucker conditions

Rewrite in final dual form

Σ λi yi = 0λi >= 0-λi >= C max Σλi – ( ΣiΣj λiλjyiyj (xi.xj) )/2 λi

Σiλi(yi (w*xi+b*)+ξi* -1)=0 Σiαi(ξi* -0)=0 -λi - αi +C = 0

If ξi*>0 αi =0 λi =CIf yi (w*xi+b*)+ξi* -1>0 λi = 0 ξi* = 0 If 0 < λi <C yi (w*xi+b*)=1