Top Banner

of 34

SSSVM2015

Jul 05, 2018

Download

Documents

Minh Le
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • 8/16/2019 SSSVM2015

    1/34

    Kernel Method

    and Support Vector Machines

    Nguyen Duc Dung, Ph.D.

    IOIT, VAST

  • 8/16/2019 SSSVM2015

    2/34

    Outline

    Reference Books, papers, slides, software

    Support vector machines (SVMs) The maximum-margin hyper-plane

    Kernel method

    Implementation  Approaches

    Sequential minimal optimization (SMO)

    Open problems

    2

  • 8/16/2019 SSSVM2015

    3/34

    Reference

    Book Cristianini, N., Shawe-Taylor, J., An Introduction to SupportVector Machines, Cambridge University Press, (2000).http://www.support-vector.net/index.html 

    Bernhard Schölkopf and Alex Smola. Learning with Kernels.MIT Press, Cambridge, MA, 2002.

    Paper C. J. C. Burges. A Tutorial on Support Vector Machines for

    Pattern Recognition. Knowledge Discovery and Data Mining ,2(2), 1998.

    Slide N. Cristianini. ICML'01 tutorial, 2001.

    Software LibSVM (NTU), SVMlight  (joachims.org)

    Online resource http://www.kernel-machines.org/ 

    3

    http://www.support-vector.net/index.htmlhttp://www.learning-with-kernels.org/http://www.kernel-machines.org/papers/Burges98.ps.gzhttp://www.kernel-machines.org/papers/Burges98.ps.gzhttp://www.support-vector.net/tutorial.htmlhttp://www.kernel-machines.org/http://www.kernel-machines.org/http://www.kernel-machines.org/http://www.kernel-machines.org/http://www.support-vector.net/tutorial.htmlhttp://www.kernel-machines.org/papers/Burges98.ps.gzhttp://www.kernel-machines.org/papers/Burges98.ps.gzhttp://www.learning-with-kernels.org/http://www.support-vector.net/index.htmlhttp://www.support-vector.net/index.htmlhttp://www.support-vector.net/index.html

  • 8/16/2019 SSSVM2015

    4/34

    Classification Problem

    4

    How would we classify this data set?

  • 8/16/2019 SSSVM2015

    5/34

    Linear Classifiers

    5

    There are many lines that can be linear classifiers.

    Which one is the better classifier?

  • 8/16/2019 SSSVM2015

    6/34

    SVM Solution

    6

    SVM solution is the linear classifier with the maximum

    margin (maximum margin linear classifier)

  • 8/16/2019 SSSVM2015

    7/34

    Margin of a Linear Function f(x) = w. x + b

    Functional margin

    Geometric margin

    Margin

    SVM solution

    7

    ),( ii   y x

     f     

     f    

  • 8/16/2019 SSSVM2015

    8/34

     A Bound on Expected Risk  of a Linear Classifier  f  = sign(w. x)

    With a probability at least (1 -  ) ,  (0,1)

     

     

     

     

       1lnln][][ 2

    2

    22

    l  Rl c f   R f   R

     f  

    emp

    where Remp is training error, l  is training size,    f  is the

    margin, ||w ||  , ||x ||   R, c is a constant

    Larger m argin, smal ler bound

    8

  • 8/16/2019 SSSVM2015

    9/34

    Finding the Maximum-Margin Classifier

    Constrain functional margin  1Minimize normal vector

    9

    ),( ii   y x

     f     

     f    

  • 8/16/2019 SSSVM2015

    10/34

    Soft and Hard Margin

    l i

    bwx yt  s

    C w

    i

    iii

    n

    i

     p

    iw,b

    ,...,1,0 

    ,1)( ..

    ||||2

    1 min

    1

    2

     

     

     

     

    l ibw.x y s.t.

    w

    ii

    w,b

    ,...,1,1)(

    ||||2

    1 min 2

    Hard (maximum) margin Soft  (maximum) margin

    10

  • 8/16/2019 SSSVM2015

    11/34

    Lagrangian Optimization

    11

  • 8/16/2019 SSSVM2015

    12/34

    Kuhn-Tucker Theorem

    12

  • 8/16/2019 SSSVM2015

    13/34

    Optimization

    l i

    bwx yt  s

    C w

    i

    iii

    n

    i

     p

    iw,b

    ,...,1,0 

    ,1)( ..

    ||||2

    1

     min 1

    2

     

     

     

     

    13

    i

    ii

    ii

     ji ji ji ji

     y

    l iC 

     x x y yi

    1

    i

    11,

    .0 

    ,,...,1,0 :s.t.

    ,2

    1min

     

     

        

    Primal  problem

    Dual problem

    0i

    iii   x y 

     w

  • 8/16/2019 SSSVM2015

    14/34

    (Linear) Support Vector Machines

    Training

    Quadratic optimization

    l  variables l 2 coefficients

    Testing

     Norm of the hyperplane

    ( xi , i),  i   0 –  support

    vector

    b xw x f     )(

    0i

    iii   x y 

     w

    14

    i

    ii

    i

    i

     ji

     ji ji ji

     y

    l iC 

     x x y yi

    1

    i

    11,

    .0 

    ,,...,1,0 :s.t.

    ,2

    1min

     

     

        

  • 8/16/2019 SSSVM2015

    15/34

    Kernel Method

    Problem Most datasets are linearly non-separable

    Solution

    Map input data into a higher dimensional  feature space

    Find the optimal hyperplane in feature space

    15

  • 8/16/2019 SSSVM2015

    16/34

    Hyperplane in Feature Space

    VC-dimension of a class offunctions: the maximumnumber of points that can beshattered

    VC-dimension of linearfunctions in Rd  is d+1 

    Dimension of feature space ishigh

    Linear functions in featurespace has high VC-dimension, or high capacity

    16

  • 8/16/2019 SSSVM2015

    17/34

     VC Dimension: Example

    Gaussian RBF SVMs  of sufficiently small width can

    classify an arbitrary large number of training points

    correctly, and thus have  inf in i te VC dimension

    17

  • 8/16/2019 SSSVM2015

    18/34

    Linear SVMs

    Training

    Quadratic optimization

    l  variables

    l 2 coefficients

    Testing

     Norm of the hyperplane

    ( xi , i),  i   0 –  support

    vector

     

      

       

    0

    ,)(i

    b x x y sign x f   iii 

     

    0i

    iii   x yw 

     

    SVMs work with pairs  of data (dot product), not sample

    18

    i

    ii

    i

    i

     ji

     ji ji ji

     y

    l iC 

     x x y yi

    1

    i

    11,

    .0 

    ,,...,1,0 :s.t.

    ,2

    1min

     

     

        

  • 8/16/2019 SSSVM2015

    19/34

    Non-linear SVMs

    Kernel: to calculate dot product between twovectors in feature space K ( x, y) =

    Training Testing

    Norm of the hyperplane

     

      

        0

    ),()(i

    b x x K  y sign x f   iii 

     

    The maximal margin algorithm works indirectly in

     feature space via kernel, or   is not known explicitly

    0

    )(i

    iii   x y 

     

    19

    i

    ii

    i

    i

     ji

     ji ji ji

     y

    l iC 

     x x K  y yi

    1

    i

    11,

    .0 

    ,,...,1,0 :s.t.

    ),(21min

     

     

        

  • 8/16/2019 SSSVM2015

    20/34

    Kernel

    Linear: K ( x, y) =  

    Gaussian: K ( x, y) = exp(-|| x- y||2)

    Dimension of feature space: infinite

    Polynomial: K ( x, y) =  p 

    Dimension of feature space: , where d   – input space dimension 

      

       

     p

     pd  1

    20

  • 8/16/2019 SSSVM2015

    21/34

    Support Vector Learning

    Task Given a set of labeled data

    Find the decision function

    Training

    Testing

    21

    }1,1{)},{( ,...,1     d 

    l iii   R y xT 

    i

    ii

    i

    i

     ji

     ji ji ji

     y

    l iC 

     x x K  y yi

    1

    i

    11,

    .0 

    ,,...,1,0 :s.t.

    ),(2

    1min

     

     

        

      

        

    0

    ),()(i

    b x x K  y sign x f   iii 

     

    Time: O(l3 ),

     Memory: O(l

    2

     ) 

    Time: O(Ns) 

  • 8/16/2019 SSSVM2015

    22/34

    MNIST Data: SVM vs. Other

    22

    Data 60,000/10,000 training/testing

    Performance

    Hand written data

    Method Testing

    error (%)

    linear classifier (1-layer NN) 12.0

    K-nearest-neighbors 5.0

    40 PCA + quadratic

    classifier

    3.3

    SVM, Gaussian Kernel 1.42-layer NN, 300 hidden

    units, mean square error

    4.7

    Convolutional net LeNet-4 1.1 (Source: http://yann.lecun.com/)

  • 8/16/2019 SSSVM2015

    23/34

    SVM: Probability Output

    23

    SVM solution

    Probability estimation

    Maximum likelihood approach

     B x Af  

    e

     x y p

    )(

    1

    1)|1(

    i

    iiiiba

     pt  pt ba F  B A1,

    )1log()1()log(),(minarg),(

    b x x K  y x f  i

    iii   0

    ),()( 

     

    )negative#: positive,#:.(,...,1,

    1if  1

    1

    ,1if  2

    1

     

    ,1

    1)|1( where

    )(

     N  N l i

     y N 

     y N 

     N 

    e x y p p

    i

    i

    i

    b xaf  ii

  • 8/16/2019 SSSVM2015

    24/34

    Outline

    Reference Books, papers, slides, software

    Support vector machines (SVMs) The maximum-margin hyperplane

    Kernel method Implementation

     Approaches

    Sequential minimal optimization

    Open problems

    24

  • 8/16/2019 SSSVM2015

    25/34

    SVM Training

    Problem

    Quadratic programming (QP)

    Obj. function: quadratic w.r.t.

    Number of variable: l

    Number of parameter: l 2

    Complexity

    Time: O(l 3 ) or O( N S 3 + N S 

    2l  + N S dl ) 

    Memory: O(l 2  )

    Constraint: box, linear

    Approach Gradient method

    Modified gradient projection (Bottou et

    al., 94)

    Divide-and-conquer

    Decomposition alg. (e.g. Osuna et al.,

    97, Joachims, 99)

    Sequential minimal optimization (SMO)(Plat, 99)

    Parallelization

    Cascade SVM (Peter et al., 05)

    Parallel mixture of SVM (Collobert et al.,

    02)

    Approximation Online and active learning (e. g. Bordes

    et al., 05)

    Core SVM (Tsang et al., 05, 07)

    Combination of methods

    25

    i

    ii

    i

    i

     ji

    ij ji ji

     y

    l iC 

     K  y y F i

    1

    i

    11,

    ,,...,1,0 :s.t.

     2

    1)(min

     

     

        

    α

  • 8/16/2019 SSSVM2015

    26/34

    Optimality

    The Karush-Kuhn-Tucker (KKT) conditions

    where

    26

    ,0 1)(

    , 1)(

    ,0 1)(

    C  for  x f   y

    C  for  x f   y

     for  x f   y

    iii

    iii

    iii

     

     

     

    iiii

      b x x K  y x f  1

    ),()(    

    -+

    +

    -

    -

    +

    0 +1-1

    +

    -

  • 8/16/2019 SSSVM2015

    27/34

    SMO Algorithm

    Initialize solution (zero)

    While (!StoppingCondition) Select two vector {i,j }

    Optimize on {i,j }

    EndWhile

  • 8/16/2019 SSSVM2015

    28/34

  • 8/16/2019 SSSVM2015

    29/34

    Selection Heuristic and Stopping Condition

    Maximum violating pair

    Maximum gain

    where

    Stopping condition:

    29

    lowk 

    upk 

     I k  E  j

     I k  E i

    |minarg

    |maxarg

    ik lowik 

    upk 

     E  E  I k  F  j

     I k  E i

    ,|maxarg

    |maxarg

    }1,0or1,|{

    }1,0or1,|{

    t t t t low

    t t t t up

     y yC t  I 

     y yC t  I 

      

      

    )10( 3     ji   E  E 

  • 8/16/2019 SSSVM2015

    30/34

    Sequential Minimal Optimization

    Training problem

    Functional margin

    Selection heuristic

    Updating scheme

    Stopping condition

    30

    i

    ii

    i

    i

     ji

     ji ji ji

     y

    l iC 

     x x K  y yi

    1

    i

    11,

    .0 

    ,,...,1,0 :s.t.

    ),(2

    1min

     

     

        

    iik k 

    k i   y x x K  y E   

    ),(1

     

    }),(|{maxarg

    )}(|{maxarg

    ik lowik k 

    upk k 

     E  E  I k  L j

     I k  E i

     

     

        ji   E  E 

    .

    2

    ,2

    ij

    old 

     j

    old 

    i jold 

     j

    new

    ij

    old 

    i

    old 

     jiold 

    i

    new

     E  E  y

     E  E  y

     j

    i

       

       

  • 8/16/2019 SSSVM2015

    31/34

    Support Vector Regression (1)

    Training data S  = {( xi ,yi)}i = 1,…,l   R  N  R Linear regressor y = f(x) = w. x + b

       -loss function 

    31

  • 8/16/2019 SSSVM2015

    32/34

    Support Vector Regression (2)

    Optimization: minimizing

    Dual problem

    32

  • 8/16/2019 SSSVM2015

    33/34

    Open Problems

    Model selection Kernel type

    Parameter setting

    Speed and size Training: time O( N S 

    2l ),

    space O(N S l) Testing: O(N S  ) 

    Multi-classapplication One-versus-rest

    One-versus-one

    Categorical data

    33

  • 8/16/2019 SSSVM2015

    34/34