Top Banner
Discriminative, Unsupervised, Convex Learning Dale Schuurmans Department of Computing Science University of Alberta MITACS Workshop, August 26, 2005
51

Discriminative, Unsupervised, Convex Learning

Jan 09, 2016

Download

Documents

liora

Discriminative, Unsupervised, Convex Learning. Dale Schuurmans Department of Computing Science University of Alberta MITACS Workshop, August 26, 2005. Current Research Group. PhD Tao Wang reinforcement learning PhD Ali Ghodsi dimensionality reduction - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Discriminative,  Unsupervised,  Convex Learning

Discriminative, Unsupervised,

Convex Learning

Dale SchuurmansDepartment of Computing Science

University of Alberta

MITACS Workshop, August 26, 2005

Page 2: Discriminative,  Unsupervised,  Convex Learning

2

Current Research GroupPhD Tao Wang reinforcement learning

PhD Ali Ghodsi dimensionality reduction

PhD Dana Wilkinson action-based embedding

PhD Yuhong Guo ensemble learning

PhD Feng Jiao bioinformatics

PhD Jiayuan Huang transduction on graphs

PhD Qin Wang statistical natural language

PhD Adam Milstein robotics, particle filtering

PhD Dan Lizotte optimization, everything

PhD Linli Xu unsupervised SVMs

PDF Li Cheng computer vision

Page 3: Discriminative,  Unsupervised,  Convex Learning

3

Current Research GroupPhD Tao Wang reinforcement learning

PhD Dana Wilkinson action-based embedding

PhD Feng Jiao bioinformatics

PhD Qin Wang statistical natural language

PhD Dan Lizotte optimization, everything

PDF Li Cheng computer vision

Page 4: Discriminative,  Unsupervised,  Convex Learning

4

Today I will talk about: One Current Research Direction

Learning Sequence Classifiers (HMMs)

Discriminative Unsupervised Convex

EM?

Page 5: Discriminative,  Unsupervised,  Convex Learning

5

Outline

Unsupervised SVMs

Discriminative, unsupervised, convex HMMs

Tao, Dana, Feng, Qin, Dan, Li

Page 6: Discriminative,  Unsupervised,  Convex Learning

6

Page 7: Discriminative,  Unsupervised,  Convex Learning

Unsupervised Support Vector Machines

Joint work with

Linli Xu

Page 8: Discriminative,  Unsupervised,  Convex Learning

8

Main Idea

Unsupervised SVMs(and semi-supervised SVMs)

Harder computational problem than SVMs

Convex relaxation – Semidefinite program(Polynomial time)

Page 9: Discriminative,  Unsupervised,  Convex Learning

9

Background: Two-class SVM Supervised classification learning

Labeled data linear discriminant

Classification rule:

Some better than others?

0b w x

sgn( )y b w x

+

Page 10: Discriminative,  Unsupervised,  Convex Learning

10

Maximum Margin Linear Discriminant

Choose a linear discriminant to maximize

,min ( , , Plane 0)i i i iy dist y b x x w x

0b w x

Page 11: Discriminative,  Unsupervised,  Convex Learning

11

Unsupervised Learning Given unlabeled data,

how to infer classifications?

Organize objects into groups — clustering

Page 12: Discriminative,  Unsupervised,  Convex Learning

12

Idea: Maximum Margin Clustering Given unlabeled data,

find maximum margin separating hyperplane

Clusters the data

Constraint: class balance: bound difference in sizes between classes

Page 13: Discriminative,  Unsupervised,  Convex Learning

13

Challenge

Find label assignment that results in a large margin

Hard

Convex relaxation – based on semidefinite programming

Page 14: Discriminative,  Unsupervised,  Convex Learning

14

How to Derive Unsupervised SVM?

Two-class case:1. Start with Supervised Algorithm

Given vector of assignments, y, solve

* 2 1

2max ,

subject to 0 1

K

λλ e λλ yy

λ

Inv. sq. margin

Page 15: Discriminative,  Unsupervised,  Convex Learning

15

How to Derive Unsupervised SVM?

2. Think of as a function of y

If given y, would then solve

* 2 1

2( ) max ,

subject to 0 1

K

λy λ e λλ yy

λ

* 2

Goal: Choose y to minimize inverse squared margin

Problem: not a convex function of yInv. sq.

margin

Page 16: Discriminative,  Unsupervised,  Convex Learning

16

How to Derive Unsupervised SVM?

3. Re-express problem with indicators comparing y labels

If given y, would then solve

* 2 1

2( ) max ,

subject to 0 1

K

λy λ e λλ yy

λ

New variables: An equivalence relation matrix

M yy

1 if

1 if i j

iji j

y yM

y y

Inv. sq. margin

Page 17: Discriminative,  Unsupervised,  Convex Learning

17

How to Derive Unsupervised SVM?

3. Re-express problem with indicators comparing y labels

If given M, would then solve

* 2 1

2( ) max ,

subject to 0 1

M MK

λλ e λλ

λ

New variables: An equivalence relation matrix

M yy

1 if

1 if i j

iji j

y yM

y y

Maximum of linear functions is convex

Inv. sq. margin

Note: convex function of M

Page 18: Discriminative,  Unsupervised,  Convex Learning

18

How to Derive Unsupervised SVM?

4. Get constrained optimization problem

Solve for M

* 2min ( )

subject to 0 1

1, 1

M

n n

M

M

M

M

λ

e e

yy

e

encodes an equivalence relation

iff0, diag( )M M e±

1, 1n n

M

Not convex!

Class balance

Page 19: Discriminative,  Unsupervised,  Convex Learning

19

How to Derive Unsupervised SVM?

4. Get constrained optimization problem

Solve for M

* 2min ( )

subject to 0 1

1, 1

0, dia

g( )

M

n n

M

M

M

M

M

λ

e e e

e± encodes

an equivalence relationiff

0, diag( )M M e±

1, 1n n

M

Page 20: Discriminative,  Unsupervised,  Convex Learning

20

How to Derive Unsupervised SVM?

5. Relax indicator variables to obtain a convex optimization problem

Solve for M

* 2min ( )

subject to 0 1

0, diag( )

1, 1

n n

MM

M

M M

M

λ

e

e e e

±

Page 21: Discriminative,  Unsupervised,  Convex Learning

21

How to Derive Unsupervised SVM?

5. Relax indicator variables to obtain a convex optimization problem

Solve for M

* 2min ( )

subject to 0 1

0, diag( )

1, 1

n n

MM

M

M M

M

λ

e

e e e

±Semidefinite

program

Page 22: Discriminative,  Unsupervised,  Convex Learning

22

Multi-class Unsupervised SVM?

1. Start with Supervised Algorithm

Given vector of assignments, y, solve

,, ,

1

2max 1 1

subject to 0, 1

i j iy yij i j ir y r

i j i r

i i

K

i

λ

e

(Crammer & Singer 01)

Margin loss

Page 23: Discriminative,  Unsupervised,  Convex Learning

23

Multi-class Unsupervised SVM?

2. Think of as a function of y

If given y, would then solve

,, ,

1

2max 1 1

subject to 0, 1

i j iy yij i j ir y r

i j i r

i i

K

i

λ

y

e

(Crammer & Singer 01)

Margin loss

Goal: Choose y to minimize margin

loss

Problem: not a convex function of y

Page 24: Discriminative,  Unsupervised,  Convex Learning

24

Multi-class Unsupervised SVM?

3. Re-express problem with indicators comparing y labels

If given y, would then solve

,, ,

1

2max 1 1

subject to 0, 1

i j iy yij i j ir y r

i j i r

i i

K

i

λ

y

e

(Crammer & Singer 01)

Margin loss

New variables: M & D

( ) ( )1 , 1i j iij y y ir y rM D

M DD

Page 25: Discriminative,  Unsupervised,  Convex Learning

25

Multi-class Unsupervised SVM?

3. Re-express problem with indicators comparing y labels

If given M and D, would then solve

1

2

1 1

2

, max ( , , ) subject to 0,

where ( , , ) , ,

, ,

M D Q M D

Q M D n D K M

KD K

Λe e

New variables: M & D

( ) ( )1 , 1i j iij y y ir y rM D

M DD

Margin loss convex

function of M & D

Page 26: Discriminative,  Unsupervised,  Convex Learning

26

Multi-class Unsupervised SVM?

4. Get constrained optimization problem

Solve for M and D

,min ,

subject to , diag( )

0,1 , 0,1

1 1

M D

n n n k

M D

M DD M

M D

n M nk k

e

e e eClass balance

Page 27: Discriminative,  Unsupervised,  Convex Learning

27

Multi-class Unsupervised SVM?

5. Relax indicator variables to obtain a convex optimization problem

Solve for M and D

,min ,

subject to , diag( )

1 1

0,1 , 0,1n n n k

M D

M DD

M

M D

D

M

n M nk k

e

e e e

Page 28: Discriminative,  Unsupervised,  Convex Learning

28

Multi-class Unsupervised SVM?

5. Relax indicator variables to obtain a convex optimization problem

Solve for M and D

,min ,

subject to , diag( )

1 1

0 1, 0

1

M D M D

M

n M nk

M DD

D

k

M

e

e e e

±

Semidefinite program

Page 29: Discriminative,  Unsupervised,  Convex Learning

29

Experimental ResultsSemiDef

Spectral

Clusterin

g

Kmean

s

Page 30: Discriminative,  Unsupervised,  Convex Learning

30

Experimental Results

Page 31: Discriminative,  Unsupervised,  Convex Learning

31

Percentage of misclassification errors

Experimental Results

Digit dataset

Page 32: Discriminative,  Unsupervised,  Convex Learning

32

Extension to Semi-Supervised Algorithm

11 t

t

Labeled

(Cl amped)

Unlabeled

1

2

{ 1,..., }

t

ijj

M t

i t n

ij i jM y y

Matrix M :

Page 33: Discriminative,  Unsupervised,  Convex Learning

33

Experimental Results

Percentage of misclassification errors

Face dataset

Page 34: Discriminative,  Unsupervised,  Convex Learning

34

Experimental Results

Page 35: Discriminative,  Unsupervised,  Convex Learning

35

Page 36: Discriminative,  Unsupervised,  Convex Learning

Discriminative, Unsupervised, Convex HMMs

Joint work withLinli Xu

With help from Li Cheng and Tao Wang

Page 37: Discriminative,  Unsupervised,  Convex Learning

37

Hidden Markov Model

Joint probability model Viterbi classifier

1y 2y 3y

3x2x1x

)(xyP)|(maxarg xy

y

P

“hidden” state

observations

Must coordinate local classifiers )( ii xfy

Page 38: Discriminative,  Unsupervised,  Convex Learning

38

HMM Training: Supervised

Given ,11yx ,...22yx nnyx

Maximum likelihood

Conditional likelihood

)(max1 ii

n

iP yx

)|(max1 ii

n

iP xy

)()|(max1 iii

n

iPP xxy

Models input distribution

Discriminative(CRFs)

Page 39: Discriminative,  Unsupervised,  Convex Learning

39

HMM Training: Unsupervised

Given only Now what?

,1x ,...2x nx

EM!

Marginal likelihood )(max1 i

n

iP x

Exactly the part we don’t

care about

Page 40: Discriminative,  Unsupervised,  Convex Learning

40

HMM Training: Unsupervised

Given only

The problem with EM: Not convex Wrong objective Too popular Doesn’t work

,1x ,...2x nx

Page 41: Discriminative,  Unsupervised,  Convex Learning

41

HMM Training: Unsupervised

Given only

The dream: Convex training Discriminative training

When will someone invent unsupervised CRFs?

,1x ,...2x nx

)|( xyP

Page 42: Discriminative,  Unsupervised,  Convex Learning

42

HMM Training: Unsupervised

Given only

The question: How to learn effectively

without seeing any y’s?

,1x ,...2x nx

)|( xyP

Page 43: Discriminative,  Unsupervised,  Convex Learning

43

HMM Training: Unsupervised

Given only

The question: How to learn effectively

without seeing any y’s?

The answer: That’s what we already did!

Unsupervised SVMs

,1x ,...2x nx

)|( xyP

Page 44: Discriminative,  Unsupervised,  Convex Learning

44

HMM Training: Unsupervised

Given only

The plan:

,1x ,...2x nx

supervised

unsupervised

single sequence

SVM M3N

unsup SVM ?

y y

Page 45: Discriminative,  Unsupervised,  Convex Learning

45

M3N: Max Margin Markov Nets

Relational SVMs

Supervised training: Given Solve factored QP

,11yx ,...22yx nnyx

1y 2y 3y

3x2x1x

),( 21 yyxf

Page 46: Discriminative,  Unsupervised,  Convex Learning

46

Unsupervised M3Ns Strategy

Start with supervised M3N QP y-labels re-express in local M,D

equivalence relations Impose class-balance Relax non-convex constraints

Then solve a really big SDP But still polynomial size

Page 47: Discriminative,  Unsupervised,  Convex Learning

47

Unsupervised M3Ns

SDP

Page 48: Discriminative,  Unsupervised,  Convex Learning

48

Some Initial Results

Synthetic HMM Protein Secondary Structure pred.

Page 49: Discriminative,  Unsupervised,  Convex Learning

49

Page 50: Discriminative,  Unsupervised,  Convex Learning

50

Current Research GroupPhD Tao Wang reinforcement learning

PhD Dana Wilkinson action-based embedding

PhD Feng Jiao bioinformatics

PhD Qin Wang statistical natural language

PhD Dan Lizotte optimization, everything

PDF Li Cheng computer vision

Page 51: Discriminative,  Unsupervised,  Convex Learning

51

Brief Research Background

Sequential PAC Learning Linear Classifiers: Boosting, SVMs Metric-Based Model Selection Greedy Importance Sampling Adversarial Optimization & Search Large Markov Decision Processes