Discriminative, Unsupervised, Convex Learning

Discriminative, Unsupervised,

Convex Learning

Dale SchuurmansDepartment of Computing Science

University of Alberta

MITACS Workshop, August 26, 2005

2

Current Research GroupPhD Tao Wang reinforcement learning

PhD Ali Ghodsi dimensionality reduction

PhD Dana Wilkinson action-based embedding

PhD Yuhong Guo ensemble learning

PhD Feng Jiao bioinformatics

PhD Jiayuan Huang transduction on graphs

PhD Qin Wang statistical natural language

PhD Adam Milstein robotics, particle filtering

PhD Dan Lizotte optimization, everything

PhD Linli Xu unsupervised SVMs

PDF Li Cheng computer vision

3







4

Today I will talk about: One Current Research Direction

Learning Sequence Classifiers (HMMs)

Discriminative Unsupervised Convex

EM?

5

Outline

Unsupervised SVMs

Discriminative, unsupervised, convex HMMs

Tao, Dana, Feng, Qin, Dan, Li

6

Unsupervised Support Vector Machines

Joint work with

Linli Xu

8

Main Idea

Unsupervised SVMs(and semi-supervised SVMs)

Harder computational problem than SVMs

Convex relaxation – Semidefinite program(Polynomial time)

9

Background: Two-class SVM Supervised classification learning

Labeled data linear discriminant

Classification rule:

Some better than others?

0b w x

sgn( )y b w x

+

10

Maximum Margin Linear Discriminant

Choose a linear discriminant to maximize

,min ( , , Plane 0)i i i iy dist y b x x w x

0b w x

11

Unsupervised Learning Given unlabeled data,

how to infer classifications?

Organize objects into groups — clustering

12

Idea: Maximum Margin Clustering Given unlabeled data,

find maximum margin separating hyperplane

Clusters the data

Constraint: class balance: bound difference in sizes between classes

13

Challenge

Find label assignment that results in a large margin

Hard

Convex relaxation – based on semidefinite programming

14

How to Derive Unsupervised SVM?

Two-class case:1. Start with Supervised Algorithm

Given vector of assignments, y, solve

* 2 1

2max ,

subject to 0 1

K

λλ e λλ yy

λ

Inv. sq. margin

15


2. Think of as a function of y

If given y, would then solve

* 2 1

2( ) max ,

subject to 0 1

K

λy λ e λλ yy

λ

* 2

Goal: Choose y to minimize inverse squared margin

Problem: not a convex function of yInv. sq.

margin

16


3. Re-express problem with indicators comparing y labels


* 2 1

2( ) max ,

subject to 0 1

K

λy λ e λλ yy

λ

New variables: An equivalence relation matrix

M yy

1 if

1 if i j

iji j

y yM

y y

Inv. sq. margin

17



If given M, would then solve

* 2 1

2( ) max ,

subject to 0 1

M MK

λλ e λλ

λ

New variables: An equivalence relation matrix

M yy

1 if

1 if i j

iji j

y yM

y y

Maximum of linear functions is convex

Inv. sq. margin

Note: convex function of M

18


4. Get constrained optimization problem

Solve for M

* 2min ( )

subject to 0 1

1, 1

M

n n

M

M

M

M

λ

e e

yy

e

encodes an equivalence relation

iff0, diag( )M M e±

1, 1n n

M

Not convex!

Class balance

19



Solve for M

* 2min ( )

subject to 0 1

1, 1

0, dia

g( )

M

n n

M

M

M

M

M

λ

e e e

e± encodes

an equivalence relationiff

0, diag( )M M e±

1, 1n n

M

20


5. Relax indicator variables to obtain a convex optimization problem

Solve for M

* 2min ( )

subject to 0 1

0, diag( )

1, 1

n n

MM

M

M M

M

λ

e

e e e

±

21



Solve for M

* 2min ( )

subject to 0 1

0, diag( )

1, 1

n n

MM

M

M M

M

λ

e

e e e

±Semidefinite

program

22

Multi-class Unsupervised SVM?

1. Start with Supervised Algorithm

Given vector of assignments, y, solve

,, ,

1

2max 1 1

subject to 0, 1

i j iy yij i j ir y r

i j i r

i i

K

i

λ

e

(Crammer & Singer 01)

Margin loss

23


2. Think of as a function of y


,, ,

1

2max 1 1

subject to 0, 1


i j i r

i i

K

i

λ

y

e


Margin loss

Goal: Choose y to minimize margin

loss

Problem: not a convex function of y

24




,, ,

1

2max 1 1

subject to 0, 1


i j i r

i i

K

i

λ

y

e


Margin loss

New variables: M & D

( ) ( )1 , 1i j iij y y ir y rM D

M DD

25



If given M and D, would then solve

1

2

1 1

2

, max ( , , ) subject to 0,

where ( , , ) , ,

, ,

M D Q M D

Q M D n D K M

KD K

Λe e

New variables: M & D

( ) ( )1 , 1i j iij y y ir y rM D

M DD

Margin loss convex

function of M & D

26



Solve for M and D

,min ,

subject to , diag( )

0,1 , 0,1

1 1

M D

n n n k

M D

M DD M

M D

n M nk k

e

e e eClass balance

27



Solve for M and D

,min ,


1 1

0,1 , 0,1n n n k

M D

M DD

M

M D

D

M

n M nk k

e

e e e

28



Solve for M and D

,min ,


1 1

0 1, 0

1

M D M D

M

n M nk

M DD

D

k

M

e

e e e

±

Semidefinite program

29

Experimental ResultsSemiDef

Spectral

Clusterin

g

Kmean

s

30

Experimental Results

31

Percentage of misclassification errors


Digit dataset

32

Extension to Semi-Supervised Algorithm

11 t

t

Labeled

(Cl amped)

Unlabeled

1

2

{ 1,..., }

t

ijj

M t

i t n

ij i jM y y

Matrix M :

33


Percentage of misclassification errors

Face dataset

34


35

Discriminative, Unsupervised, Convex HMMs

Joint work withLinli Xu

With help from Li Cheng and Tao Wang

37

Hidden Markov Model

Joint probability model Viterbi classifier

1y 2y 3y

3x2x1x

)(xyP)|(maxarg xy

y

P

“hidden” state

observations

Must coordinate local classifiers )( ii xfy

38

HMM Training: Supervised

Given ,11yx ,...22yx nnyx

Maximum likelihood

Conditional likelihood

)(max1 ii

n

iP yx

)|(max1 ii

n

iP xy

)()|(max1 iii

n

iPP xxy

Models input distribution

Discriminative(CRFs)

39

HMM Training: Unsupervised

Given only Now what?

,1x ,...2x nx

EM!

Marginal likelihood )(max1 i

n

iP x

Exactly the part we don’t

care about

40


Given only

The problem with EM: Not convex Wrong objective Too popular Doesn’t work

,1x ,...2x nx

41


Given only

The dream: Convex training Discriminative training

When will someone invent unsupervised CRFs?

,1x ,...2x nx

)|( xyP

42


Given only

The question: How to learn effectively

without seeing any y’s?

,1x ,...2x nx

)|( xyP

43


Given only

The question: How to learn effectively

without seeing any y’s?

The answer: That’s what we already did!

Unsupervised SVMs

,1x ,...2x nx

)|( xyP

44


Given only

The plan:

,1x ,...2x nx

supervised

unsupervised

single sequence

SVM M3N

unsup SVM ?

y y

45

M3N: Max Margin Markov Nets

Relational SVMs

Supervised training: Given Solve factored QP

,11yx ,...22yx nnyx

1y 2y 3y

3x2x1x

),( 21 yyxf

46

Unsupervised M3Ns Strategy

Start with supervised M3N QP y-labels re-express in local M,D

equivalence relations Impose class-balance Relax non-convex constraints

Then solve a really big SDP But still polynomial size

47

Unsupervised M3Ns

SDP

48

Some Initial Results

Synthetic HMM Protein Secondary Structure pred.

49

50







51

Brief Research Background

Sequential PAC Learning Linear Classifiers: Boosting, SVMs Metric-Based Model Selection Greedy Importance Sampling Adversarial Optimization & Search Large Markov Decision Processes

Discriminative, Unsupervised, Convex Learning

Documents