Top Banner
Kernel Methods & S V M hi Support Vector Machines Mahdi pakdaman Naeini PhD Candidate, University of Tehran Senior Researcher, TOSAN Intelligent Data Miners Senior Researcher, TOSAN Intelligent Data Miners
58

Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Feb 28, 2019

Download

Documents

vanhanh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Kernel Methods &S V M hiSupport Vector Machines

Mahdi pakdaman Naeini

PhD Candidate, University of TehranSenior Researcher, TOSAN Intelligent Data MinersSenior Researcher, TOSAN Intelligent Data Miners

Page 2: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Outline

Motivation

I d i i i d M hi Introduction to pattern recognition and Machine

LearningLearning

Introduction to Kernels

Spars kernel methods (SVM)

Anomaly detection using kernel methods

2Automated Fraud Detection: Kernel Methods & SVM

Page 3: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Motivation

Fraud detection’s perspectives Fraud detection s perspectives Fast recall time of the learner Binary class classificationy One-class classification

Generalization performance of the kernel methods Generalization performance of the kernel methods Different kinds of information can be used Good performance in high dimensional feature spacep g p Using Linear learning typically has nice properties

Unique optimal solutions Fast learning algorithms Better statistical analysis

3Automated Fraud Detection: Kernel Methods & SVM

Page 4: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Introduction

Data can exhibit regularities that may or may not be immediately apparent exact patterns – e.g. motions of planets

complex patterns – e.g. genes in DNAp p g g

probabilistic patterns – e.g. market research

Detecting patterns makes it possible to understand Detecting patterns makes it possible to understand and/or exploit the regularities to make predictions

Machine Learning and Pattern Recognition is the study of automatic detection of patterns in data

4Automated Fraud Detection: Kernel Methods & SVM

Page 5: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Pattern Definingg

Generative Models Generative Models Parametric Models

Using Maximum entropy Model GMM Using Maximum entropy Model, GMM ,… Non Parametric Models

Histogram Based Methods, KNN, Parzen estimate,..Histogram Based Methods, KNN, Parzen estimate,..

Discriminative models Linear and non-Linear discriminant Models

Linear RegressionN l N k Neural Networks

SVM ….

5Automated Fraud Detection: Kernel Methods & SVM

Page 6: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Historical perspectivep p

Minsky and Pappert highlighted the weakness in their book Perceptrons

Neural networks overcame the problem by gluing together many linear units with non-lineartogether many linear units with non linear activation functions Solved problem of capacity and led to very impressive Solved problem of capacity and led to very impressive

extension of applicability of learning B t r n int tr inin pr bl m f p d nd m ltipl But ran into training problems of speed and multiple

local minima

6Automated Fraud Detection: Kernel Methods & SVM

Page 7: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Kernel methods approachpp

• The kernel methods approach is to stick with linear functions but work in a high dimensional feature space:

• The expectation is that the feature space has a much higher dimension than the input spacemuch higher dimension than the input space.

7Automated Fraud Detection: Kernel Methods & SVM

Page 8: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Form of the functions

• So kernel methods use linear functions in a feature space:

For regression this could be the function

For classification require thresholding

8Automated Fraud Detection: Kernel Methods & SVM

Page 9: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Examplep

C id h i Consider the mapping

Φ : R2 → R3

(X1 , X2 ) → (Z1 , Z2 , Z3 ) = (X12 , 2½ X1 X2 , X2

2 )

If we consider a linear equation in this feature space:

We actually have an ellipse – i.e. a non-linear shape in the input space.

9Automated Fraud Detection: Kernel Methods & SVM

Page 10: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Examples of Kernels (III)p ( )

Polynomial kernel (n=2)

RBF kernel

10Automated Fraud Detection: Kernel Methods & SVM

Page 11: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Capacity of feature spacesp y p

The capacity is proportional to the dimension for example:

2-dim: 2 dim:

Automated Fraud Detection: Kernel Methods & SVM 11

Page 12: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Problems of high dimensionsg

• Computational costs involved in dealing with large vectors Kernel Function & Kernel trick

C i il b l d l d• Capacity may easily become too large and lead to over-fitting: being able to realise every classifier means unlikely to generalise well Large Margin trickg g

12Automated Fraud Detection: Kernel Methods & SVM

Page 13: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Different Perspective: Learning & Similarityp g y

I / X Y• Input / output sets X , Y

• Training set (x1 y1) , …., (xm,ym) є XxYT ai i g set ( 1,y1) , …., ( m,ym) є X Y

• Generalization : given a previously unseen x є X , find it bl Ya suitable y є Y

• (x,y) should be “similar” to (x1 y1) , …., (xm,ym)( ,y) ( 1,y1) , , ( m,ym)

• How to measure similarity? For outputs: loss function : (e.g. for yє{-1, +1}, zer-one loss) For inputs : kernel function

13Automated Fraud Detection: Kernel Methods & SVM

Page 14: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Similarity of Inputsy p Symmetric function y

k : X x X → R(x, x’) → k(x,x’)

For example : if X = Rn : canonical dot product

• If X is not a dot product space: assume that k has a representation as a dot product in a linear space H e g there is a mapping Φ : X → Ha dot product in a linear space H. e.g there is a mapping Φ : X → H such that:

• In that case, we can think of the patterns as and , also carry out geometric algorithms in in dot product space (feature space) HH

14Automated Fraud Detection: Kernel Methods & SVM

Page 15: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

An Example of Kernel Methodp

Id l if i t i f t di t hi hIdea: classify points in feature space according to which of the two class means is closer:

Compute the sign of dot product between w := C+ - C and X-CCompute the sign of dot product between w : C+ C- and X C

15Automated Fraud Detection: Kernel Methods & SVM

Page 16: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

An Example: Cnt’dp

• Provides geometric interpretation of Parzen windowsTh d i i f i i h l• The decision function is a hyperplane

16Automated Fraud Detection: Kernel Methods & SVM

Page 17: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Example: All Degree 2 Monomialp gΦ : R2 → R3

(X1 , X2 ) → (Z1 , Z2 , Z3 ) = (X12 , 2½ X1 X2 , X2

2 )

17Automated Fraud Detection: Kernel Methods & SVM

Page 18: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

The kernel Trick

• The dot product in H can be computed in R2

18Automated Fraud Detection: Kernel Methods & SVM

Page 19: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Kernel Trick : Cnt’d

• More Generally:

• Where maps into the space spanned by all ordered products of d input directionsp p

19Automated Fraud Detection: Kernel Methods & SVM

Page 20: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Kernel : more formal definition

A kernel k(x y) A kernel k(x,y) is a similarity measure defined by an implicit mapping f defined by an implicit mapping f, from the original space to a vector space (feature space) such that: k(x,y)=f(x)•f(y)( ,y) ( ) (y)

This similarity measure and the mapping include: Invariance or other a priori knowledge Invariance or other a priori knowledge Simpler structure (linear representation of the data) Possibly infinite dimension (hypothesis space for learning)y ( yp p g) but still computational efficiency when computing k(x,y) Different kind of data eg. string, set, graph, tree, text, ….

20Automated Fraud Detection: Kernel Methods & SVM

Page 21: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Valid Kernels

The function k(x,y) is a valid kernel, if there exists a mapping f ( ,y) , pp ginto a vector space (with a dot-product) such that k can be expressed as k(x,y)=f(x)•f(y)

Theorem: k(x,y) is a valid kernel if k is positive definite and symmetric (Mercer Kernel)

A function is P.D. if In other words the Gram matrix K (whose elements are k(x x )) must be

20)()(),( LfddffK yxyxyx

In other words, the Gram matrix K (whose elements are k(xi,xj)) must be positive definite for all ai, aj of the input space

21Automated Fraud Detection: Kernel Methods & SVM

Page 22: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

How to build new kernels

Kernel combinations, preserving validity:10)()1()()( 21 yxyxyx ,K,K,K

)()()(0)(.)(

)()()()(

1

21

yxyxyxyxyx

yyy

KKKa,Ka,K

)().()()().()( 21

yxyxyxyx

functionvaluedrealisfyfxf,K,K,K,K

)())()(()( 3

yxyxyφxφyx

positivedefinitesymmetricPP,K,K,K

)()()()( 1

yyxxyxyxKK

,K,K )()( 11 yyxx ,K,K

22Automated Fraud Detection: Kernel Methods & SVM

Page 23: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

I t d ti t i l d D l fIntroduction to primal and Dual form of learning functionof learning function

23Automated Fraud Detection: Kernel Methods & SVM

Page 24: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Linear Regressiong

Gi i i dGiven training data:

1 1 2 2, , , , , , , , ,

points and labels i

ni

S y y y y

R y R

i

i

x x x x

x

Construct linear function that best interpolates a given training set

1( ) , '

n

i ii

g w x

x w x w x

24Automated Fraud Detection: Kernel Methods & SVM

Page 25: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Linear Regression exampleg p

,w x

y

25Automated Fraud Detection: Kernel Methods & SVM

Page 26: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Ridge Regressiong g

Inverse typically does not exist. Use least norm solution for fixed Regularized problem

2 2

Optimality Condition:

2 2min ( , )L S w w w y Xw 0.

Optimality Condition:( , )

2 2 ' 2 ' 0L S

w

w X y X Xw

yw

' 'n X X I w X y Requires O(n3) operations

26Automated Fraud Detection: Kernel Methods & SVM

Page 27: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Ridge Regression (cont)g g ( )

Inverse always exists for any

1' ' w X X I X y 0.

Alternative representation: y

1' ' ' '

1

1

' ' ' '

' '

X X I w X y w X y X Xw

w X y Xw X α

1

'

α y Xw

α y Xw y XX α

1

'

where '

α y Xw y XX αXX α α y

α G I y G XX Solving ll equation is 0(l3) where α G I y G XXg q ( )

27Automated Fraud Detection: Kernel Methods & SVM

Page 28: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Dual Ridge Regressiong g

To predict new point:

1( ) 'g x w x x x y G I z

1

( ) , ,

where ,

i ii

i

g

x w x x x y G I z

z x x

Note need only compute G, the Gram Matrix

w e e ,i

p

' ,ij i jG G XX x x

Ridge Regression requires only inner products between data pointsp p

28Automated Fraud Detection: Kernel Methods & SVM

Page 29: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Efficiencyy

To compute w in primal ridge regression is O(n3)p g g ( )

in primal ridge regression is O(l3)T di i To predict new point xprimal O(n) ( ) ,

n

i ig w x w x xp

dual O( l)

1 1 1

( ) , ,n

i i i i j ji i j

g

x wx x x x x

1i

O(nl)1 1 1i i j

Dual is better if n>>l

29Automated Fraud Detection: Kernel Methods & SVM

Page 30: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Key ingredient of dual solutiony g

S 1• Step 1: α = (G + λ I )-1 y G = X X’ G = X.X Gij = <xi ,xj >

• Step 2:

• Important observation :pBoth steps only involve inner products between input data points

30Automated Fraud Detection: Kernel Methods & SVM

Page 31: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Kernel Trick: Summeryy

• Any Learning Algorithm that only depends on dot products can benefit from kernel trick

• This way, we can apply linear methods to vectorial as well as non-vectorial data

• Think of kernels as non linear similarity measures• Example of common kernels:Example of common kernels:

31Automated Fraud Detection: Kernel Methods & SVM

Page 32: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Kernel Method Application ISparse Kernels (SVM)

32Automated Fraud Detection: Kernel Methods & SVM

Page 33: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Sparse Kernels Method: pSupport Vector Classifier

33Automated Fraud Detection: Kernel Methods & SVM

Page 34: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Separating Hypreplanep g yp p

34Automated Fraud Detection: Kernel Methods & SVM

Page 35: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Optimal Separating Hypeplanep p g yp p

35Automated Fraud Detection: Kernel Methods & SVM

Page 36: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Eliminating the scaling freedomg g

36Automated Fraud Detection: Kernel Methods & SVM

Page 37: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Canonical Optimal Hyperplanep yp p

37Automated Fraud Detection: Kernel Methods & SVM

Page 38: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Formulation as an Optimization Problemp

38Automated Fraud Detection: Kernel Methods & SVM

Page 39: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Lagrange Functiong g

39Automated Fraud Detection: Kernel Methods & SVM

Page 40: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Derivation of Dual Problem

40Automated Fraud Detection: Kernel Methods & SVM

Page 41: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

The Support Vector Expansionpp p

41Automated Fraud Detection: Kernel Methods & SVM

Page 42: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Dual problemp

42Automated Fraud Detection: Kernel Methods & SVM

Page 43: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Example: Gaussian kernelp

43Automated Fraud Detection: Kernel Methods & SVM

Page 44: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Non-Separable datap

11

2

7

44Automated Fraud Detection: Kernel Methods & SVM

Page 45: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Soft Margin SVMsg

45Automated Fraud Detection: Kernel Methods & SVM

Page 46: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

The v-propertyp p y

46Automated Fraud Detection: Kernel Methods & SVM

Page 47: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Duals Using Kernelsg

47Automated Fraud Detection: Kernel Methods & SVM

Page 48: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

SVM Trainingg

48Automated Fraud Detection: Kernel Methods & SVM

Page 49: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Kernel Method Application II

Anomaly Detection usingAnomaly Detection using Kernel Methods

49Automated Fraud Detection: Kernel Methods & SVM

Page 50: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

One-Class SVM

• Finding the smallest hyper sphere containing the most training data (Y. Chen et al. 2001)

• Mapping data into feature space finding the• Mapping data into feature space finding the maximum margin line separating the origin from th d d t (Sh lk f l 1999)the mapped data (Sholkopf et al. 1999)

50Automated Fraud Detection: Kernel Methods & SVM

Page 51: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Chen ‘s Method

51Automated Fraud Detection: Kernel Methods & SVM

Page 52: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Scholkopf Method: Separating unlabeled p p gdata from the origin

52Automated Fraud Detection: Kernel Methods & SVM

Page 53: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

V-Soft Margin Separationg p

53Automated Fraud Detection: Kernel Methods & SVM

Page 54: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Dual problemp

54Automated Fraud Detection: Kernel Methods & SVM

Page 55: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Remark

These two methods would be the same if the kernel K(x,y) just depends on x-y

Scholkopf et al. 2001

55Automated Fraud Detection: Kernel Methods & SVM

Page 56: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

ConclusionKernel methods provide a general purpose toolkit for pattern analysis Advantages Advantages

kernels define flexible interface to the data enabling the user to encode prior knowledge into a measure of similarity between two items – with the proviso that it must satisfy the psd property.

algorithms well-founded in statistical learning theory enable efficient and effective exploitation of the high-dimensional representations to enable good off-training performance.

Subspace methods can often be implemented in kernel defined feature spaces using Subspace methods can often be implemented in kernel defined feature spaces using dual representations

Overall gives a generic plug and play framework for analysing data, combining different data types, models, tasks, and preprocessing

We can accommodate different kind of data in learning problems using kernel methods

Convex optimization problem: local minima is globally optimum

Disadvantages The choose of kernel is somewhat heuristic and depend on the application you face Risk of encountering overfitting

56Automated Fraud Detection: Kernel Methods & SVM

Page 57: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

عشق به شده دلشزنده آنكه هرگزنميرد آنكه دلش زنده شده به عشقهرگزنميردثبت است بر جريده عالم دوام ما

Page 58: Kernel Methods & SVMhiSupport Vector Machinespeople.cs.pitt.edu/~pakdaman/tutorials/kernel.pdf · Kernel Methods & SVMhiSupport Vector Machines Mahdi pakdaman Naeini PhD Candidate,

Thank you!y

Automated Fraud Detection: Kernel Methods & SVM