Top Banner
Multiple Kernel Learning Hossein Hajimirsadeghi School of Computing Science Simon Fraser University November 5, 2013
32

Multiple Kernel Learning

Feb 07, 2016

Download

Documents

gzifa

November 5, 2013. Multiple Kernel Learning. Hossein Hajimirsadeghi School of Computing Science Simon Fraser University. Introduction - SVM. Max . Margin. s.t. s.t. Regularizer. Loss Function. SVM: Optimization Problem. s.t. SVM: Dual. Primal. s.t. s.t. Dual. SVM-Dual. s.t. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Multiple Kernel Learning

Multiple Kernel Learning

Hossein HajimirsadeghiSchool of Computing Science

Simon Fraser University

November 5, 2013

Page 2: Multiple Kernel Learning

2

Introduction - SVM

0)(. bxw

1)(. bxw

w

1Max . Margin

1))(.( bxwy ii

2

, 2

1min w

bw

is.t.

1)(. bxwbxwxf )(.)(

Page 3: Multiple Kernel Learning

3

iii bxwy 1))(.(

i

ibw

Cw 2

, 2

1min

is.t. 0i

Page 4: Multiple Kernel Learning

4

i

iibw

bxwyCw ))(.(1,0max2

1min

2

,

Regularizer )),(( ii yxfl

Loss Function

Page 5: Multiple Kernel Learning

5

SVM: Optimization Problem

iii bxwy 1))(.(

i

ibw

Cw 2

, 2

1min

is.t. 0i

iii

iiiii

ii

bxwy

CwbwL

1))((

2

1),,,,(

2

Page 6: Multiple Kernel Learning

6

SVM: Dual

iii bxwy 1))(.(

i

ibw

Cw 2

, 2

1min

is.t. 0i

ji

jijijii

i xxyy,

)().(2

1max

Ci 0

0i

ii y

s.t. i

Primal

Dual

Page 7: Multiple Kernel Learning

SVM-Dual

7

ji

jijijii

i xxyy,

)().(2

1max

Ci 00

iii y

s.t. i

Resulting Classifier:

,)().()(.)( bxxybxwxfi

iii

j

ijji xxyyb )().(

ji xxK ,

Page 8: Multiple Kernel Learning

8

Kernel Methods

:such that kernel, called ,: Define XXK

)().(),( yxyxK

Ideas:

K often interpreted as a similarity measure

Benefits: Efficiency Flexibility

22211 )(),( cyxyxyxK

c

xc

xc

xx

x

x

c

xc

xc

xx

x

x

2

1

21

21

21

2

1

21

21

21

2

2

2.

2

2

2

Page 9: Multiple Kernel Learning

Kernelized SVM

9

ji

jijijii

i xxyy,

)().(2

1max

Ci 00

iii y

s.t. i

Classifier:

,)()()(.)( bxxybxwxfi

iii

j

ijji xxyyb )()(

Page 10: Multiple Kernel Learning

Kernelized

10

ji

jijijii

i xxKyy,

),(2

1max

Ci 00

iii y

s.t. i

Classifier:

,),()(.)( bxxKybxwxfi

iii

j

ijji xxKyyb ),(

Page 11: Multiple Kernel Learning

11

Kernelized SVM

),(...),(),(

...

...

),(...),(),(

),(...),(),(

21

22212

12111

NNNN

N

N

xxKxxKxxK

xxKxxKxxK

xxKxxKxxK

K

YKYααα1α

TT

2

1max

Cα00Yα1TSubject to

Page 12: Multiple Kernel Learning

12

Ideal Kernel MatrixTyyK

bxxKyxfi

iii ),()(

ji

jijiji yy

yyyyxxK

1

1),(

byyyxfi

iii )(

byyxfi

ii 2)(

Page 13: Multiple Kernel Learning

13

Motivation for MKL

• Success of SVM is dependent on choice of good kernel:– How to choose kernels• Kernel function• Parameters

• Practical problems involve multiple heterogeneous data sources– How can kernels help to fuse features• Esp. features from different modalities

Page 14: Multiple Kernel Learning

14

Multiple Kernel Learning

P

m

mj

mimji xxKfxxK

1),(,

P

m

mj

mimmji xxKxxK

1

),(,

General MKL:

Linear MKL:

Page 15: Multiple Kernel Learning

15

MKL Algorithms

• Fixed Rules• Heuristic Approaches• Similarity Optimization– Maximizing the similarity to ideal kernel matrix

• Structural Risk Optimization– Minimizing “regularization term” + “error term”

Page 16: Multiple Kernel Learning

16

Similarity Optimization

• Similarity:– kernel alignment– Euclidean distance– Kullback-Leibler (KL) divergence

2211

2121

,,

,),(

KKKK

KKKK A

i j

jiji xxKxxK ),(),(, 222

11121 KK

),( TA yyK

Page 17: Multiple Kernel Learning

17

Similarity Optimization

• Lanckriet et al. (2004)

0 ,1 s.t.

),(max

KK

yyK

tr

A T

P

mmm

1

KK

Can be converted to a Semi-definite programming problem

Better Results: Centered Kernel AlignmentCortes et al (2010)

Page 18: Multiple Kernel Learning

18

Structural Risk Optimization

YαYKαα1 ηα

TT

2

1max

Cα00Yα1T

Subject to

)( ηK

)()(min ηK ηη

r

0ηK

Subject to

Page 19: Multiple Kernel Learning

Structural Risk Optimization

19

Subject to

)()(min ηK ηη

r

0ηK

General MKL (Varma et al. 2009)

η

K η )(**

2

1Yα

η

KYα η

T

Coordinate descent algorithm:1-Fix kernel parameters and find 2-Fix and update by gradientα η

η α

YαYKαα1 ηα

TT

2

1max )( ηK

Page 20: Multiple Kernel Learning

20

Structural Risk: Another View

P

m

mj

mimmji xxKxxK

1

),(, η

)(),(, jiji xxxxK ηηη

η0 if)(),( m

jmmim xx

)(

...

)(

)(

)(

...

)(

)(

,2

22

111

222

111

PiPP

i

i

T

PiPP

i

i

ji

x

x

x

x

x

x

xxK

η

)( ixη

Page 21: Multiple Kernel Learning

21

Structural Risk: Another Viewbxxf )(.)(,, ηηbw w

b

x

x

x

xf

PiPP

i

i

P

)(

...

)(

)(

].,...,,[)(2

22

111

21,,

wwwηbw

b

x

x

x

xf

PiP

i

i

PP

)(

...

)(

)(

].,...,,[)(2

2

11

2211,,

wwwηbw

bxdxfP

mmmm

1,, )(.)( wdbw

1d 2d Pd

Page 22: Multiple Kernel Learning

22

Structural Risk: Another Viewbxdxf

P

mmmm

1,, )(.)( wdbw

i

P

mmmmi bxdy

1))(.(1

w

i

i

P

mmm

bwCd

1

2

,, 2

1min w

is.t. 0i

mmm d wv :

i

P

mmmi bxy

1))(.(1

v

i

i

P

mmm

bvdCd

1

2

,,, 2

1min v

is.t. 0i

Page 23: Multiple Kernel Learning

23

Structural Risk Optimization

i

P

mmmi bxy

1))(.(1

v is.t. 0i

Simple MKL

i

i

P

mmm

bCdJ

1

2

,, 2

1min)( vdv

)(min dd

J

11

P

mmd 0mdSuch that

Rakotomamonjy et al. 2008

Page 24: Multiple Kernel Learning

24

Multi-Class SVM

yyy bxyxf )(.),(, wbw

iiiii yyyxfyxf ),(),(),( ww

i

iC

2

, 2

1min ww

yi,

s.t. 0i

),(.),( yxyxf ww

i

i

yy

yy

0

1

)),(),,((max iiiyy

yxfyxfli

ww

Page 25: Multiple Kernel Learning

25

Latent SVM

iii xfy 1)(w

i

iC

2

, 2

1min ww

is.t. 0i

),(.),( hxhxF ww

),(.max)( hxxfh

ww

1x 2x mx

1h 2h mh…

… ),( hxm),( 2 hx),( 1 hx

)(h

),(.max hxih

w

Page 26: Multiple Kernel Learning

26

),,(.max yhxih

w

Multi-Class Latent SVM

iiiii yyyxfyxf ),(),(),( ww

i

iC

2

, 2

1min ww

yi,

s.t. 0i

),,(.),,( yhxyhxF ww

),,(.max),( yhxyxfh

ww

1x 2x mx

1h 2h mh…

y

),( hxm),( 2 hx),( 1 hx

),( hy

),,(.max iih

yhxw

Page 27: Multiple Kernel Learning

27

Latent Kernelized Structural SVM

i

iC

2

, 2

1min ww

Wu and Jia 2012

),,(.),,( yhxyhxF ww

),,(.max),( yhxyxfh

ww

)),(),,((max iiiyy

i yxfyxfli

ww

)),,(max),,(max1,0max(,

iih

iyh

i yhxFyhxF ww

Page 28: Multiple Kernel Learning

28

Latent Kernelized Structural SVM

iiC

2

, 2

1min ww

)),,(max),,(max1,0max(,

iih

iyh

i yhxFyhxF ww

Find the dual

The dual Variables: ui , Su S

i Su

iiuw vxuxKvxFyhxF ),,,(),(),,( w

Page 29: Multiple Kernel Learning

29

Latent Kernelized Structural SVM

i Su

iiuSv

wSv

vxuxKvxFxf ),,,(max),(max)( w

Inference),,(.),,( yhxyhxF ww

),,(.max),( yhxyxfh

ww

),(.max)( vxxfSv

ww

NO EFFICIENT EXACT

SOLUTION

?),,,(max,

jjiihh

hxhxKji

Page 30: Multiple Kernel Learning

30

Latent MKL

i

P

mimmi bhxy

1)),(.(1

v is.t. 0i

m

mi

i

P

mmm

dbdCd 2

1

2

,,, 22

1min

vv

Vahdat et al. 2013Latent Version of SimpleMKL

P

mii

hhxdxf

1

),(.max)( ww

0md1y i

* ihh

1y i h

Coordinate descent Learning Algorithm:

1-Perform inference for positive samples2-Solve the dual optimization problem like SimpleMKL

Find the dual

Page 31: Multiple Kernel Learning

31

Some other works

• Hierarchical MKL (Bach 2008)• Latent Kernel SVM (Yang et al. 2012)• Deep MKL (Strobl and Visweswaran 2013)

Page 32: Multiple Kernel Learning

32

References• Gönen, M., & Alpaydın, E. (2011). Multiple kernel learning algorithms. The Journal of Machine Learning

Research, 2211-2268.

• Rakotomamonjy, A., Bach, F., Canu, S., and Grandvalet, Y. (2008). SimpleMKL.Journal of Machine Learning Research, 9, 2491-2521.

• Varma, M., & Babu, B. R. (2009, June). More generality in efficient multiple kernel learning. In Proceedings of the 26th Annual International Conference on Machine Learning (pp. 1065-1072).

• Cortes, C., Mohri, M., & Rostamizadeh, A. (2010). Two-stage learning kernel algorithms. In Proceedings of the 27th International Conference on Machine Learning (ICML-10) (pp. 239-246).

• Lanckriet, G. R., Cristianini, N., Bartlett, P., Ghaoui, L. E., & Jordan, M. I. (2004). Learning the kernel matrix with semidefinite programming. The Journal of Machine Learning Research, 5, 27-72.

• Wu, X., & Jia, Y. (2012). View-invariant action recognition using latent kernelized structural SVM. In Computer Vision–ECCV 2012 (pp. 411-424). Springer Berlin Heidelberg.

• Felzenszwalb, P., McAllester, D., & Ramanan, D. (2008, June). A discriminatively trained, multiscale, deformable part model. In Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on (pp. 1-8). IEEE.

• Yang, W., Wang, Y., Vahdat, A., & Mori, G. (2012). Kernel Latent SVM for Visual Recognition. In Advances in Neural Information Processing Systems(pp. 818-826).

• Vahdat, A., Cannons, K., Mori, G., Oh, S., & Kim, I. (2013). Compositional Models for Video Event Detection: A Multiple Kernel Learning Latent Variable Approach. IEEE International Conference on Computer Vision (ICCV).

• Cortes, C., Mohri, M., Rostamizadeh, A., ICML 2011 Tutorial: Learning Kernels.