Top Banner
05/14/22 Support Vector Machines M.W. Mak Support Vector Support Vector Machines Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung, M.W. Mak, and S.H. Lin. Biometric Authentication: A Machine Learning Approach, Prentice Hall, to appear. 2. S.R. Gunn, 1998. Support Vector Machines for Classification and Regression. (http://www.isis.ecs.soton.ac.uk/resources/svminfo/) 3. Bernhard Schölkopf. Statistical learning and kernel methods. MSR-TR 2000-23, Microsoft Research, 2000. (ftp://ftp.research.microsoft.com/pub/tr/tr-2000-23.pdf) 4. For more resources on support vector machines, see http://www.kernel-machines.org/
34

10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

Jan 01, 2016

Download

Documents

Caren Chambers
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

04/20/23 1Support Vector Machines M.W. Mak

Support Vector MachinesSupport Vector Machines

1. Introduction to SVMs2. Linear SVMs3. Non-linear SVMs

References:

1. S.Y. Kung, M.W. Mak, and S.H. Lin. Biometric Authentication: A Machine Learning Approach, Prentice Hall, to appear.

2. S.R. Gunn, 1998. Support Vector Machines for Classification and Regression. (http://www.isis.ecs.soton.ac.uk/resources/svminfo/)

3. Bernhard Schölkopf. Statistical learning and kernel methods. MSR-TR 2000-23, Microsoft Research, 2000. (ftp://ftp.research.microsoft.com/pub/tr/tr-2000-23.pdf)

4. For more resources on support vector machines, see http://www.kernel-machines.org/

Page 2: 10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

04/20/23 2Support Vector Machines M.W. Mak

Introduction SVMs were developed by Vapnik in 1995 and are becoming

popular due to their attractive features and promising performance.

Conventional neural networks are based on empirical risk minimization where network weights are determined by minimizing the mean squares error between the actual outputs and the desired outputs.

SVMs are based on the structural risk minimization principle where parameters are optimized by minimizing classification error.

SVMs have been shown to posses better generalization capability than conventional neural networks.

Page 3: 10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

04/20/23 3Support Vector Machines M.W. Mak

Introduction (Cont.)

Given N labeled empirical data:

}1,1{),(,),,( 11 Xyy NNxx

where X is the set of input data in and yi are the labels.

1iy

1iy

1x

2x

Domain X

1c2c

(1)

D

Page 4: 10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

04/20/23 4Support Vector Machines M.W. Mak

Introduction (Cont.)

We construct a simple classifier by computing the means of the two classes

1:2

21:1

1

1 and

1

ii yii

yii NN

xcxc

where N1 and N2 are the number of data in the class with positive and negative labels, respectively.

We assign a new point x to the class whose mean is closer to it.

To achieve this, we compute 2)( 21 ccc

(2)

Page 5: 10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

04/20/23 5Support Vector Machines M.W. Mak

Introduction (Cont.) Then, we determine the class of x by checking whether the

vector connecting x and c encloses an angle smaller than /2 with the vector

1x

2xDomain X

1c

2c

c

x

.21 ccw b

y

)()(sgn

)(2)(sgn

)(sgn

21

2121

cxcx

ccccx

wcx

)(2

1 2

1

2

2 cc bwhere

Page 6: 10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

04/20/23 6Support Vector Machines M.W. Mak

Introduction (Cont.) In the special case where b = 0, we have

1:

1

1:

1

1:21:1

)()(sgn

)(1

)(1

sgn

21

ii

ii

yiiN

yiiN

yii

yii NN

y

xxxx

xxxx

This means that we use ALL data point xi, each being weighted equally by 1/N1 or 1/N2, to define the decision plane.

(3)

Page 7: 10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

04/20/23 7Support Vector Machines M.W. Mak

Introduction (Cont.)

1iy

1iy2x

Domain X

1c2c

x

1x

Decision plan

ww

Page 8: 10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

04/20/23 8Support Vector Machines M.W. Mak

Introduction (Cont.)

However, we might want to remove the influence of patterns that are far away from the decision boundary, because their influence is usually small.

We may also select only a few important data point (called support vectors) and weight them differently.

Then, we have a support vector machine.

Page 9: 10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

04/20/23 9Support Vector Machines M.W. Mak

Introduction (Cont.)

1iy

1iy2x

Domain X x

1x

Decision plane

Support vectors

Margin

We aim to find a decision plane that maximizes the margin.

Page 10: 10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

04/20/23 10Support Vector Machines M.W. Mak

Linear SVMs

Assume that all training data satisfy the constraints:

1for 1

1for 1

ii

ii

yb

yb

wx

wx

which means

iby ii 01)( wx

Training data points for which the above equality holds lie on hyperplanes parallel to the decision plane.

(4)

(5)

Page 11: 10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

04/20/23 11Support Vector Machines M.W. Mak

Linear SVMs (Conts.)

2x

1x

Margin: d

Therefore, maximizing the margin is equivalent to minimizing ||w||2.

0 bxw

1 bxw

1 bxw

w

wxx

w

w

xxw

xw

xw

2

2)(

2))((

1)(

1)(

21

21

2

1

d

b

b1: x2: xww

Page 12: 10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

04/20/23 12Support Vector Machines M.W. Mak

Linear SVMs (Lagrangian) We minimize ||w||2 subject to the constraint that

iby ii 01)( wx

This can be achieved by introducing Lagrange multipliers

and a Lagrangian

)1)((2

1),,(

1

2

N

iiii bybL wxww

The Lagrangian has to be minimized with respect to w and b and maximized with respect to 0i

(6)

(7)

Nii 1}0{

Page 13: 10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

04/20/23 13Support Vector Machines M.W. Mak

Linear SVMs (Lagrangian) Setting

0),,( and 0),,(

bLbLb

ww

w

We obtain

N

iiii

N

iii yy

11

and 0 xw

Patterns for which are called Support Vectors. These vectors lie on the margin and satisfy

0k

Skby kk 01)( wx

where S contains the indexes to the support vectors.

(8)

Patterns for which are considered to be irrelevant to the classification.

0k

Page 14: 10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

04/20/23 14Support Vector Machines M.W. Mak

Linear SVMs (Wolfe Dual) Substituting (8) into (7), we obtain the Wolfe dual:

N

iiii

jiji

N

i

N

jji

N

ii

yNi

yyL

1

1 11

0 and ,,,1 ,0 subject to

)(2

1)( :Maximize

xx

The hyper-decision plane is thus

N

iiii bybf

1

)(sgnsgn)( xxxwx

(9)

.ctorsupport vea is and 1 where1 kkk yb xxw

Page 15: 10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

04/20/23 15Support Vector Machines M.W. Mak

Linear SVMs (Example) Analytical example (3-point problem):

1 ]0.10.0[

1 ]0.00.1[

1 ]0.00.0[

33

22

11

y

y

y

T

T

T

x

x

x

Objective function:

3

1

3

1

3

1

3

1

0 and ,3,,1 ,0 subject to

)(2

1)( :Maximize

iiii

jijii j

jii

i

yi

yyL

xx

Page 16: 10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

04/20/23 16Support Vector Machines M.W. Mak

Linear SVMs (Example) We introduce another Lagrange multiplier λ to obtain the

Lagrangian

)(2

1

2

1

)(),(

32123

22321

3

1

i

ii yLF

Differentiating F(α, λ) with respect to λ and αi and set the results to zero, we obtain

1,2,2,4 321

Page 17: 10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

04/20/23 17Support Vector Machines M.W. Mak

Linear SVMs (Example) Substitute the Lagrange multipliers into Eq. 8

11

]22[

2

3

1

xw

xw

T

T

iiii

b

y

-1 -0.5 0 0.5 1 1.5 2-1

-0.5

0

0.5

1

1.5

2

1 2

3

Linear SVM, C=100, #SV=3, acc=100.00%, normW=2.83

x1

x2

T

T

xx

xx

b

][

05.0

0

21

21

x

xw

Page 18: 10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

04/20/23 18Support Vector Machines M.W. Mak

Linear SVMs (Example) 4-point linear separable problem:

-1 -0.5 0 0.5 1 1.5 2-1

-0.5

0

0.5

1

1.5

2

1

2

3

4

Linear SVM, C=100, #SV=4, accuracy=100.00%

x1

x2

-1 -0.5 0 0.5 1 1.5 2-1

-0.5

0

0.5

1

1.5

2

1

2

3

4

Linear SVM, C=100, #SV=3, accuracy=100.00%

x1

x2

4 SVs 3 SVs

Page 19: 10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

04/20/23 19Support Vector Machines M.W. Mak

Linear SVMs (Non-linearly separable) Non-linearly separable: patterns that cannot be separated by

a linear decision boundary without incurring classification error.

0 2 4 6 8 100

1

2

3

4

5

6

7

8

9

10

1

2

3 4

5

6

7

8

9

10

11 12

13

14

15

16

17

18

1920

Data that causes classification error in linear SVMs

Page 20: 10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

04/20/23 20Support Vector Machines M.W. Mak

Linear SVMs (Non-linearly separable) We introduce a set of slack variables with

},,,{ 21 N

0i

iby iii 1)( wx

The slack variables allow some data to violate the constraints defined for the linearly separable case (Eq. 6):

iby ii 1)( wx

Therefore, for some we have ,0 where kk

1)( by kk wx

0.8 and 5.0)( e.g. kkk by wx

Page 21: 10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

04/20/23 21Support Vector Machines M.W. Mak

0 2 4 6 8 100

1

2

3

4

5

6

7

8

9

10

1

2

3 4

5

6

7

8

9

10

11 12

13

14

15

16

17

18

1920

Linear SVM, C=1000.0, #SV=7, acc=95.00%, normW=0.94

x1

x2

Linear SVMs (Non-linearly separable) E.g. because x10 and x19

are inside the margins, i.e. they violate the constraint (Eq. 6).

667.01910

667.01910

Page 22: 10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

04/20/23 22Support Vector Machines M.W. Mak

Linear SVMs (Non-linearly separable) For non-separable cases:

1)( subject to

2

1 : Minimize

1

2

iii

N

ii

by

C

wx

w

where C is a user-defined penalty parameter to penalize any violation of the margins.

The Lagrangian becomes

N

iiii

N

iiii

N

ii byCbL

111

2)1)((

2

1),,( wxww

Page 23: 10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

04/20/23 23Support Vector Machines M.W. Mak

Linear SVMs (Non-linearly separable) Wolfe dual optimization:

N

iiii

jiji

N

i

N

jji

N

ii

yNiC

yyL

1

1 11

0 and ,,,1 ,0 subject to

)(2

1)( :Maximize

xx

The output weight vector and bias term are

N

iiii y

1

xw

.ctorsupport vea is and 1 where1 kkk yb xxw

Page 24: 10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

04/20/23 24Support Vector Machines M.W. Mak0 2 4 6 8 10

0

1

2

3

4

5

6

7

8

9

10

1

2

3 4

5

6

7

8

9

10

11 12

13

14

15

16

17

18

1920

Linear SVM, C=10.0, #SV=7, acc=95.00%, normW=0.94

x1

x2

2. Linear SVMs (Types of SVs) Three types of support vectors

1. On the margin:

0;85.2

0;44.0

1)(

0,0

11

1111

by

C

iT

i

ii

xw

2. Inside the margin:

667.0;10

1)(

20 ;

1010

by

C

iT

i

ii

xw

3. Outside the margin:

667.2;10

1)(

2 ;

2020

by

C

iT

i

ii

xw

01717

Page 25: 10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

04/20/23 25Support Vector Machines M.W. Mak

0 2 4 6 8 100

1

2

3

4

5

6

7

8

9

10

1

2

3 4

5

6

7

8

9

10

11 12

13

14

15

16

17

18

1920

Linear SVM, C=10.0, #SV=7, acc=95.00%, normW=0.94

x1

x2

2. Linear SVMs (Types of SVs)

1

1)(

0;0;1

b

by

y

T

iT

i

iii

xw

xw

1

1)(

0;0;1

b

by

y

T

iT

i

iii

xw

xw

33.0bT xw

33.0bT xw

1bT xw

1bT xw

67.1

67.167.21)(

67.2;;1

20

2020

202020

b

by

Cy

T

T

xw

xw

Page 26: 10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

04/20/23 26Support Vector Machines M.W. Mak

0 2 4 6 8 100

1

2

3

4

5

6

7

8

9

10

1

2

3 4

5

6

7

8

9

10

11 12

13

14

15

16

17

18

1920

Linear SVM, C=10.0, #SV=7, acc=95.00%, normW=0.94

x1

x2

2. Linear SVMs (Types of SVs)

1

1)(

0;0;1

b

by

y

T

iT

i

iii

xw

xw

1

1)(

0;0;1

b

by

y

T

iT

i

iii

xw

xw

33.0bT xw

33.0bT xw

1bT xw

1bT xw

67.1

67.167.21)(

67.2;;1

20

2020

202020

b

by

Cy

T

T

xw

xw

Swapping Class 1 and Class 2

Page 27: 10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

04/20/23 27Support Vector Machines M.W. Mak

2. Linear SVMs (Types of SVs) Effect of varying C:

0 2 4 6 8 100

1

2

3

4

5

6

7

8

9

10

1

2

3 4

5

6

7

8

9

10

11 12

13

14

15

16

17

18

1920

Linear SVM, C=0.1, #SV=10, acc=95.00%, normW=0.57

x1

x2

C = 0.1

0 2 4 6 8 100

1

2

3

4

5

6

7

8

9

10

1

2

3 4

5

6

7

8

9

10

11 12

13

14

15

16

17

18

1920

Linear SVM, C=100.0, #SV=7, acc=95.00%, normW=0.94

x1

x2

C = 1002.5i

i 0.4i

i

Page 28: 10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

04/20/23 28Support Vector Machines M.W. Mak

3. Non-linear SVMs In case the training data X are not linearly separable, we may

use a kernel function to map the data from the input space to a feature space where data become linearly separable.

1iy

1iy2x

Input Space (Domain X)1x

Decision boundary 1iy

1iy

Decision boundary

Feature Space

),( iK xx

Page 29: 10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

04/20/23 29Support Vector Machines M.W. Mak

3. Non-linear SVMs (Conts.) The decision function becomes

N

iiii bKyf

1

),(sgn)( xxx

(a)

Page 30: 10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

04/20/23 30Support Vector Machines M.W. Mak

3. Non-linear SVMs (Conts.)

Page 31: 10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

04/20/23 31Support Vector Machines M.W. Mak

3. Non-linear SVMs (Conts.)

The decision function becomes

N

iiii bKyf

1

),(sgn)( xxx

For RBF kernels

222exp),( iiK xxxx

For polynomial kernels

0 ,1),(2

pKp

ii

xxxx

Page 32: 10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

04/20/23 32Support Vector Machines M.W. Mak

3. Non-linear SVMs (Conts.)

The decision function becomes

N

iiii bKyf

1

),(sgn)( xxx

The optimization problem becomes:

N

iiii

jiji

N

i

N

jji

N

ii

yNi

KyyW

1

1 11

0 and ,,,1 ,0 subject to

),(2

1)( :Maximize

xx(9)

Page 33: 10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

04/20/23 33Support Vector Machines M.W. Mak

3. Non-linear SVMs (Conts.) The effect of varying C on RBF-SVMs:

0 2 4 6 8 100

1

2

3

4

5

6

7

8

9

10

1

2

3 4

5

6

7

8

9

10

11 12

13

14

15

16

17

18

1920

RBF SVM, 2*sigma2=8.0, C=1000.0, #SV=7, acc=100.00%

x1

x2

C = 10 C = 1000

0 2 4 6 8 100

1

2

3

4

5

6

7

8

9

10

1

2

3 4

5

6

7

8

9

10

11 12

13

14

15

16

17

18

1920

RBF SVM, 2*sigma2=8.0, C=10.0, #SV=9, acc=90.00%

x1

x2

09.3i

i 0.0i

i

Page 34: 10/18/2015 1 Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,

04/20/23 34Support Vector Machines M.W. Mak

3. Non-linear SVMs (Conts.) The effect of varying C on Polynomial-SVMs:

C = 10 C = 1000

0 2 4 6 8 100

1

2

3

4

5

6

7

8

9

10

1

2

3 4

5

6

7

8

9

10

11 12

13

14

15

16

17

18

1920

Polynomial SVM, degree=2, C=1000.0, #SV=8, acc=90.00%

x1

x2

0 2 4 6 8 100

1

2

3

4

5

6

7

8

9

10

1

2

3 4

5

6

7

8

9

10

11 12

13

14

15

16

17

18

1920

Polynomial SVM, degree=2, C=10.0, #SV=7, acc=90.00%

x1

x2

99.2i

i 97.2i

i