Top Banner
02/03/2018 Introduction to Data Mining 1 Data Mining Support Vector Machines Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 2 Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data
12

Support’Vector’Machines Introduction’to’Data’Mining,’2nd ...kumar001/dmbook/slides/chap4_svm.pdf · 02/03/2018 Introduction’ toDataMining’ 21 Example$of$Nonlinear$SVM

Aug 29, 2019

Download

Documents

vuongquynh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Support’Vector’Machines Introduction’to’Data’Mining,’2nd ...kumar001/dmbook/slides/chap4_svm.pdf · 02/03/2018 Introduction’ toDataMining’ 21 Example$of$Nonlinear$SVM

02/03/2018   Introduction   to  Data  Mining   1

Data  Mining

Support  Vector  Machines

Introduction  to  Data  Mining,  2nd Editionby

Tan,  Steinbach,  Karpatne,  Kumar

02/03/2018   Introduction   to  Data  Mining   2

Support  Vector  Machines

● Find  a  linear  hyperplane  (decision  boundary)  that  will  separate  the  data

Page 2: Support’Vector’Machines Introduction’to’Data’Mining,’2nd ...kumar001/dmbook/slides/chap4_svm.pdf · 02/03/2018 Introduction’ toDataMining’ 21 Example$of$Nonlinear$SVM

02/03/2018   Introduction   to  Data  Mining   3

Support  Vector  Machines

● One  Possible  Solution

B1

02/03/2018   Introduction   to  Data  Mining   4

Support  Vector  Machines

● Another  possible  solution

B2

Page 3: Support’Vector’Machines Introduction’to’Data’Mining,’2nd ...kumar001/dmbook/slides/chap4_svm.pdf · 02/03/2018 Introduction’ toDataMining’ 21 Example$of$Nonlinear$SVM

02/03/2018   Introduction   to  Data  Mining   5

Support  Vector  Machines

● Other  possible  solutions

B2

02/03/2018   Introduction   to  Data  Mining   6

Support  Vector  Machines

● Which  one  is  better?  B1  or  B2?● How  do  you  define  better?

B1

B2

Page 4: Support’Vector’Machines Introduction’to’Data’Mining,’2nd ...kumar001/dmbook/slides/chap4_svm.pdf · 02/03/2018 Introduction’ toDataMining’ 21 Example$of$Nonlinear$SVM

02/03/2018   Introduction   to  Data  Mining   7

Support  Vector  Machines

● Find  hyperplane  maximizes the  margin  =>  B1  is  better  than  B2

B1

B2

b11

b12

b21b22

margin

02/03/2018   Introduction   to  Data  Mining   8

Support  Vector  Machines

B1

b11

b12

0=+• bxw !!

1−=+• bxw !! 1+=+• bxw !!

⎩⎨⎧

−≤+•−

≥+•=

1bxw if11bxw if1

)( !!!!

!xf ||||2 Marginw!

=

Page 5: Support’Vector’Machines Introduction’to’Data’Mining,’2nd ...kumar001/dmbook/slides/chap4_svm.pdf · 02/03/2018 Introduction’ toDataMining’ 21 Example$of$Nonlinear$SVM

02/03/2018   Introduction   to  Data  Mining   9

Linear  SVM

● Linear  model:  

● Learning  the  model  is  equivalent  to  determining  the  values  of  – How  to  find                          from  training  data?

⎩⎨⎧

−≤+•−

≥+•=

1bxw if11bxw if1

)( !!!!

!xf

and bw!

and bw!

02/03/2018   Introduction   to  Data  Mining   10

Learning  Linear  SVM

● Objective  is  to  maximize:

– Which  is  equivalent  to  minimizing:– Subject  to  the  following  constraints:

or

u This  is  a  constrained  optimization  problem– Solve it using Lagrange multiplier method

||||2 Marginw!

=

⎩⎨⎧

−≤+•−

≥+•=

1bxw if 11bxw if1

i

i!!!!

iy

2||||)(2wwL

!!=

Niby ii ,...,2,1 ,1)( =≥+•xw

Page 6: Support’Vector’Machines Introduction’to’Data’Mining,’2nd ...kumar001/dmbook/slides/chap4_svm.pdf · 02/03/2018 Introduction’ toDataMining’ 21 Example$of$Nonlinear$SVM

02/03/2018   Introduction   to  Data  Mining   11

Example  of  Linear  SVM

x1 x2 y λ0.3858 0.4687 1 65.52610.4871 0.611 -­1 65.52610.9218 0.4103 -­1 00.7382 0.8936 -­1 00.1763 0.0579 1 00.4057 0.3529 1 00.9355 0.8132 -­1 00.2146 0.0099 1 0

Support  vectors

02/03/2018   Introduction   to  Data  Mining   12

Learning  Linear  SVM

● Decision  boundary  depends  only  on  support  vectors– If  you  have  data  set  with  same  support  vectors,  decision  boundary  will  not  change

– How  to  classify  using  SVM  once  w and  b are  found?  Given  a  test  record,  xi

⎩⎨⎧

−≤+•−

≥+•=

1bxw if11bxw if1

)(i

i!!!!

!ixf

Page 7: Support’Vector’Machines Introduction’to’Data’Mining,’2nd ...kumar001/dmbook/slides/chap4_svm.pdf · 02/03/2018 Introduction’ toDataMining’ 21 Example$of$Nonlinear$SVM

02/03/2018   Introduction   to  Data  Mining   13

Support  Vector  Machines

●What  if  the  problem  is  not  linearly  separable?

02/03/2018   Introduction   to  Data  Mining   14

Support  Vector  Machines

●What  if  the  problem  is  not  linearly  separable?– Introduce  slack  variables

u Need  to  minimize:

u Subject  to:  

u If  k  is  1  or  2,  this  leads  to  same  objective  function  as  linear  SVM  but  with  different  constraints  (see  textbook)

⎩⎨⎧

+−≤+•−

≥+•=

ii

ii

1bxw if1-1bxw if1ξξ

!!!!

iy

⎟⎠

⎞⎜⎝

⎛+= ∑

=

N

i

kiCwwL

1

2

2||||)( ξ!

Page 8: Support’Vector’Machines Introduction’to’Data’Mining,’2nd ...kumar001/dmbook/slides/chap4_svm.pdf · 02/03/2018 Introduction’ toDataMining’ 21 Example$of$Nonlinear$SVM

02/03/2018   Introduction   to  Data  Mining   15

Support  Vector  Machines

● Find  the  hyperplane  that  optimizes  both  factors

B1

B2

b11

b12

b21b22

margin

02/03/2018   Introduction   to  Data  Mining   16

Nonlinear  Support  Vector  Machines

●What  if  decision  boundary  is  not  linear?

Page 9: Support’Vector’Machines Introduction’to’Data’Mining,’2nd ...kumar001/dmbook/slides/chap4_svm.pdf · 02/03/2018 Introduction’ toDataMining’ 21 Example$of$Nonlinear$SVM

02/03/2018   Introduction   to  Data  Mining   17

Nonlinear  Support  Vector  Machines

● Trick:  Transform  data  into  higher  dimensional  space

0)( =+Φ• bxw !!Decision  boundary:

02/03/2018   Introduction   to  Data  Mining   18

Learning  Nonlinear  SVM

● Optimization  problem:

●Which  leads  to  the  same  set  of  equations  (but  involve  Φ(x)  instead  of  x)

Page 10: Support’Vector’Machines Introduction’to’Data’Mining,’2nd ...kumar001/dmbook/slides/chap4_svm.pdf · 02/03/2018 Introduction’ toDataMining’ 21 Example$of$Nonlinear$SVM

02/03/2018   Introduction   to  Data  Mining   19

Learning  NonLinear  SVM

● Issues:– What  type  of  mapping  function  Φ should  be  used?

– How  to  do  the  computation  in  high  dimensional  space?u Most  computations  involve  dot  product  Φ(xi)•Φ(xj)  u Curse  of  dimensionality?

02/03/2018   Introduction   to  Data  Mining   20

Learning  Nonlinear  SVM

● Kernel  Trick:– Φ(xi)•Φ(xj)  =  K(xi,  xj)  

– K(xi,  xj)  is  a  kernel  function  (expressed  in  terms  of  the  coordinates  in  the  original  space)u Examples:

Page 11: Support’Vector’Machines Introduction’to’Data’Mining,’2nd ...kumar001/dmbook/slides/chap4_svm.pdf · 02/03/2018 Introduction’ toDataMining’ 21 Example$of$Nonlinear$SVM

02/03/2018   Introduction   to  Data  Mining   21

Example  of  Nonlinear  SVM

SVM  with  polynomial  degree  2  kernel

02/03/2018   Introduction   to  Data  Mining   22

Learning  Nonlinear  SVM

● Advantages  of  using  kernel:– Don’t  have  to  know  the  mapping  function  Φ– Computing  dot  product  Φ(xi)•Φ(xj)  in  the  original  space  avoids  curse  of  dimensionality

● Not  all  functions  can  be  kernels– Must  make  sure  there  is  a  corresponding  Φ in  some  high-­dimensional  space

– Mercer’s  theorem  (see  textbook)

Page 12: Support’Vector’Machines Introduction’to’Data’Mining,’2nd ...kumar001/dmbook/slides/chap4_svm.pdf · 02/03/2018 Introduction’ toDataMining’ 21 Example$of$Nonlinear$SVM

02/03/2018   Introduction   to  Data  Mining   23

Characteristics  of  SVM

● Since  the  learning  problem  is  formulated  as  a  convex  optimization  problem,  efficient  algorithms  are  available  to  find  the  global  minima  of  the  objective  function  (many  of  the  other  methods  use  greedy  approaches  and  find  locally  optimal  solutions)

● Overfitting  is  addressed  by  maximizing  the  margin  of  the  decision  boundary,  but  the  user  still  needs  to  provide  the  type  of  kernel  function  and  cost  function

● Difficult  to  handle  missing  values● Robust  to  noise● High  computational  complexity  for  building  the  model