SSSVM2015

8/16/2019 SSSVM2015

1/34

Kernel Method

and Support Vector Machines

Nguyen Duc Dung, Ph.D.

IOIT, VAST

8/16/2019 SSSVM2015

2/34

Outline

Reference Books, papers, slides, software

Support vector machines (SVMs) The maximum-margin hyper-plane

Kernel method

Implementation Approaches

Sequential minimal optimization (SMO)

Open problems

2

8/16/2019 SSSVM2015

3/34

Reference

Book Cristianini, N., Shawe-Taylor, J., An Introduction to SupportVector Machines, Cambridge University Press, (2000).http://www.support-vector.net/index.html

Bernhard Schölkopf and Alex Smola. Learning with Kernels.MIT Press, Cambridge, MA, 2002.

Paper C. J. C. Burges. A Tutorial on Support Vector Machines for

Pattern Recognition. Knowledge Discovery and Data Mining ,2(2), 1998.

Slide N. Cristianini. ICML'01 tutorial, 2001.

Software LibSVM (NTU), SVMlight (joachims.org)

Online resource http://www.kernel-machines.org/

3

http://www.support-vector.net/index.htmlhttp://www.learning-with-kernels.org/http://www.kernel-machines.org/papers/Burges98.ps.gzhttp://www.kernel-machines.org/papers/Burges98.ps.gzhttp://www.support-vector.net/tutorial.htmlhttp://www.kernel-machines.org/http://www.kernel-machines.org/http://www.kernel-machines.org/http://www.kernel-machines.org/http://www.support-vector.net/tutorial.htmlhttp://www.kernel-machines.org/papers/Burges98.ps.gzhttp://www.kernel-machines.org/papers/Burges98.ps.gzhttp://www.learning-with-kernels.org/http://www.support-vector.net/index.htmlhttp://www.support-vector.net/index.htmlhttp://www.support-vector.net/index.html

8/16/2019 SSSVM2015

4/34

Classification Problem

4

How would we classify this data set?

8/16/2019 SSSVM2015

5/34

Linear Classifiers

5

There are many lines that can be linear classifiers.

Which one is the better classifier?

8/16/2019 SSSVM2015

6/34

SVM Solution

6

SVM solution is the linear classifier with the maximum

margin (maximum margin linear classifier)

8/16/2019 SSSVM2015

7/34

Margin of a Linear Function f(x) = w. x + b

Functional margin

Geometric margin

Margin

SVM solution

7

),( ii y x

f

f

8/16/2019 SSSVM2015

8/34

A Bound on Expected Risk of a Linear Classifier f = sign(w. x)

With a probability at least (1 - ) , (0,1)

1lnln][][ 2

2

22

l Rl c f R f R

f

emp

where Remp is training error, l is training size, f is the

margin, ||w || , ||x || R, c is a constant

Larger m argin, smal ler bound

8

8/16/2019 SSSVM2015

9/34

Finding the Maximum-Margin Classifier

Constrain functional margin 1Minimize normal vector

9

),( ii y x

f

f

8/16/2019 SSSVM2015

10/34

Soft and Hard Margin

l i

bwx yt s

C w

i

iii

n

i

p

iw,b

,...,1,0

,1)( ..

||||2

1 min

1

2

l ibw.x y s.t.

w

ii

w,b

,...,1,1)(

||||2

1 min 2

Hard (maximum) margin Soft (maximum) margin

10

8/16/2019 SSSVM2015

11/34

Lagrangian Optimization

11

8/16/2019 SSSVM2015

12/34

Kuhn-Tucker Theorem

12

8/16/2019 SSSVM2015

13/34

Optimization

l i

bwx yt s

C w

i

iii

n

i

p

iw,b

,...,1,0

,1)( ..

||||2

1

min 1

2

13

l

i

ii

l

ii

l

ji ji ji ji

y

l iC

x x y yi

1

i

11,

.0

,,...,1,0 :s.t.

,2

1min

Primal problem

Dual problem

0i

iii x y

w

8/16/2019 SSSVM2015

14/34

(Linear) Support Vector Machines

Training

Quadratic optimization

l variables l 2 coefficients

Testing

Norm of the hyperplane

( xi , i), i 0 – support

vector

b xw x f )(

0i

iii x y

w

14

l

i

ii

l

i

i

l

ji

ji ji ji

y

l iC

x x y yi

1

i

11,

.0

,,...,1,0 :s.t.

,2

1min

8/16/2019 SSSVM2015

15/34

Kernel Method

Problem Most datasets are linearly non-separable

Solution

Map input data into a higher dimensional feature space

Find the optimal hyperplane in feature space

15

8/16/2019 SSSVM2015

16/34

Hyperplane in Feature Space

VC-dimension of a class offunctions: the maximumnumber of points that can beshattered

VC-dimension of linearfunctions in Rd is d+1

Dimension of feature space ishigh

Linear functions in featurespace has high VC-dimension, or high capacity

16

8/16/2019 SSSVM2015

17/34

VC Dimension: Example

Gaussian RBF SVMs of sufficiently small width can

classify an arbitrary large number of training points

correctly, and thus have inf in i te VC dimension

17

8/16/2019 SSSVM2015

18/34

Linear SVMs

Training

Quadratic optimization

l variables

l 2 coefficients

Testing


( xi , i), i 0 – support

vector

0

,)(i

b x x y sign x f iii

0i

iii x yw

SVMs work with pairs of data (dot product), not sample

18

l

i

ii

l

i

i

l

ji

ji ji ji

y

l iC

x x y yi

1

i

11,

.0

,,...,1,0 :s.t.

,2

1min

8/16/2019 SSSVM2015

19/34

Non-linear SVMs

Kernel: to calculate dot product between twovectors in feature space K ( x, y) =

Training Testing


0

),()(i

b x x K y sign x f iii

The maximal margin algorithm works indirectly in

feature space via kernel, or is not known explicitly

0

)(i

iii x y

19

l

i

ii

l

i

i

l

ji

ji ji ji

y

l iC

x x K y yi

1

i

11,

.0

,,...,1,0 :s.t.

),(21min

8/16/2019 SSSVM2015

20/34

Kernel

Linear: K ( x, y) =

Gaussian: K ( x, y) = exp(-|| x- y||2)

Dimension of feature space: infinite

Polynomial: K ( x, y) = p

Dimension of feature space: , where d – input space dimension

p

pd 1

20

8/16/2019 SSSVM2015

21/34

Support Vector Learning

Task Given a set of labeled data

Find the decision function

Training

Testing

21

}1,1{)},{( ,...,1 d

l iii R y xT

l

i

ii

l

i

i

l

ji

ji ji ji

y

l iC

x x K y yi

1

i

11,

.0

,,...,1,0 :s.t.

),(2

1min

0

),()(i

b x x K y sign x f iii

Time: O(l3 ),

Memory: O(l

2

)

Time: O(Ns)

8/16/2019 SSSVM2015

22/34

MNIST Data: SVM vs. Other

22

Data 60,000/10,000 training/testing

Performance

Hand written data

Method Testing

error (%)

linear classifier (1-layer NN) 12.0

K-nearest-neighbors 5.0

40 PCA + quadratic

classifier

3.3

SVM, Gaussian Kernel 1.42-layer NN, 300 hidden

units, mean square error

4.7

Convolutional net LeNet-4 1.1 (Source: http://yann.lecun.com/)

8/16/2019 SSSVM2015

23/34

SVM: Probability Output

23

SVM solution

Probability estimation

Maximum likelihood approach

B x Af

e

x y p

)(

1

1)|1(

l

i

iiiiba

pt pt ba F B A1,

)1log()1()log(),(minarg),(

b x x K y x f i

iii 0

),()(

)negative#: positive,#:.(,...,1,

1if 1

1

,1if 2

1

,1

1)|1( where

)(

N N l i

y N

y N

N

t

e x y p p

i

i

i

b xaf ii

8/16/2019 SSSVM2015

24/34

Outline

Reference Books, papers, slides, software

Support vector machines (SVMs) The maximum-margin hyperplane

Kernel method Implementation

Approaches

Sequential minimal optimization

Open problems

24

8/16/2019 SSSVM2015

25/34

SVM Training

Problem

Quadratic programming (QP)

Obj. function: quadratic w.r.t.

Number of variable: l

Number of parameter: l 2

Complexity

Time: O(l 3 ) or O( N S 3 + N S

2l + N S dl )

Memory: O(l 2 )

Constraint: box, linear

Approach Gradient method

Modified gradient projection (Bottou et

al., 94)

Divide-and-conquer

Decomposition alg. (e.g. Osuna et al.,

97, Joachims, 99)

Sequential minimal optimization (SMO)(Plat, 99)

Parallelization

Cascade SVM (Peter et al., 05)

Parallel mixture of SVM (Collobert et al.,

02)

Approximation Online and active learning (e. g. Bordes

et al., 05)

Core SVM (Tsang et al., 05, 07)

Combination of methods

25

l

i

ii

l

i

i

l

ji

ij ji ji

y

l iC

K y y F i

1

i

11,

0

,,...,1,0 :s.t.

2

1)(min

α

8/16/2019 SSSVM2015

26/34

Optimality

The Karush-Kuhn-Tucker (KKT) conditions

where

26

,0 1)(

, 1)(

,0 1)(

C for x f y

C for x f y

for x f y

iii

iii

iii

l

iiii

b x x K y x f 1

),()(

-+

+

-

-

+

0 +1-1

+

-

8/16/2019 SSSVM2015

27/34

SMO Algorithm

Initialize solution (zero)

While (!StoppingCondition) Select two vector {i,j }

Optimize on {i,j }

EndWhile

8/16/2019 SSSVM2015

28/34

8/16/2019 SSSVM2015

29/34

Selection Heuristic and Stopping Condition

Maximum violating pair

Maximum gain

where

Stopping condition:

29

lowk

upk

I k E j

I k E i

|minarg

|maxarg

ik lowik

upk

E E I k F j

I k E i

,|maxarg

|maxarg

}1,0or1,|{

}1,0or1,|{

t t t t low

t t t t up

y yC t I

y yC t I

)10( 3 ji E E

8/16/2019 SSSVM2015

30/34

Sequential Minimal Optimization

Training problem

Functional margin

Selection heuristic

Updating scheme

Stopping condition

30

l

i

ii

l

i

i

l

ji

ji ji ji

y

l iC

x x K y yi

1

i

11,

.0

,,...,1,0 :s.t.

),(2

1min

iik k

l

k

k i y x x K y E

),(1

}),(|{maxarg

)}(|{maxarg

ik lowik k

upk k

E E I k L j

I k E i

ji E E

.

2

,2

ij

old

j

old

i jold

j

new

ij

old

i

old

jiold

i

new

E E y

E E y

j

i

8/16/2019 SSSVM2015

31/34

Support Vector Regression (1)

Training data S = {( xi ,yi)}i = 1,…,l R N R Linear regressor y = f(x) = w. x + b

-loss function

31

8/16/2019 SSSVM2015

32/34

Support Vector Regression (2)

Optimization: minimizing

Dual problem

32

8/16/2019 SSSVM2015

33/34

Open Problems

Model selection Kernel type

Parameter setting

Speed and size Training: time O( N S

2l ),

space O(N S l) Testing: O(N S )

Multi-classapplication One-versus-rest

One-versus-one

Categorical data

33

8/16/2019 SSSVM2015

34/34

SSSVM2015

Documents