Top Banner
Learning Structural SVMs with Latent Variables Chun-Nam John Yu, Thorsten Joachims Presenter: Jacob Kahn (jacokahn) October 19, 2017 Chun-Nam John Yu, Thorsten Joachims Structural SVMs with Latent Variables October 19, 2017 1 / 21
21

Learning Structural SVMs with Latent Variablesdanroth/Teaching/CIS-700-006/slides/lat… · Learning Structural SVMs with Latent Variables ICML, 2009 Tsochantaridis et al. Support

Aug 08, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Learning Structural SVMs with Latent Variablesdanroth/Teaching/CIS-700-006/slides/lat… · Learning Structural SVMs with Latent Variables ICML, 2009 Tsochantaridis et al. Support

Learning Structural SVMs with Latent Variables

Chun-Nam John Yu, Thorsten Joachims

Presenter: Jacob Kahn (jacokahn)

October 19, 2017

Chun-Nam John Yu, Thorsten Joachims Structural SVMs with Latent Variables October 19, 2017 1 / 21

Page 2: Learning Structural SVMs with Latent Variablesdanroth/Teaching/CIS-700-006/slides/lat… · Learning Structural SVMs with Latent Variables ICML, 2009 Tsochantaridis et al. Support

Overview

1 Motivating Problem

2 Structured SVM

3 Structured SVM with Latent Variables

4 Optimization

5 Experiments and Applications

Chun-Nam John Yu, Thorsten Joachims Structural SVMs with Latent Variables October 19, 2017 2 / 21

Page 3: Learning Structural SVMs with Latent Variablesdanroth/Teaching/CIS-700-006/slides/lat… · Learning Structural SVMs with Latent Variables ICML, 2009 Tsochantaridis et al. Support

Motivating Problem: Noun Phrase Coreferencing

Task: determine which noun phrases in some piece of text refer tothe same entity.

Christopher Robin is alive and well. He lives in England. He is the sameperson that you read about in the book, Winnie the Pooh. As a boy, Chrislived in a pretty home called Cotchfield Farm. When Chris was three yearsold, his father wrote a poem about him. The poem was printed in amagazine for others to read. Mr. Robin then wrote a book.

Correlation clustering: objective function maximizes the sum ofpairwise similarities.

Chun-Nam John Yu, Thorsten Joachims Structural SVMs with Latent Variables October 19, 2017 3 / 21

Page 4: Learning Structural SVMs with Latent Variablesdanroth/Teaching/CIS-700-006/slides/lat… · Learning Structural SVMs with Latent Variables ICML, 2009 Tsochantaridis et al. Support

Motivating Problem: Noun Phrase Coreferencing

Christopher Robin is alive and well. He lives in England. He is the sameperson that you read about in the book, Winnie the Pooh. As a boy, Chrislived in a pretty home called Cotchfield Farm. When Chris was three yearsold, his father wrote a poem about him. The poem was printed in amagazine for others to read. Mr. Robin then wrote a book.

For a cluster of size k , there are O(k2) links, the vast majority ofwhich contain very weak signals.

Di�cult to determine transitive coreference without searching throughan entire piece of text.

Chun-Nam John Yu, Thorsten Joachims Structural SVMs with Latent Variables October 19, 2017 4 / 21

Page 5: Learning Structural SVMs with Latent Variablesdanroth/Teaching/CIS-700-006/slides/lat… · Learning Structural SVMs with Latent Variables ICML, 2009 Tsochantaridis et al. Support

Motivating Problem: Noun Phrase Coreferencing

Here, Y is the set of non-contradictory pairwise clusters.

Instead, model as an agglomeration problem.Input: x , contains n noun phrases, and pairwise features xij betweenthe ith and jth noun phrases.Output: y , which is a partition of the N phrases into coreferentclusters.To choose which clusters are strong, put a latent variable h, which isa spanning forest of strong coreference links that is consistent with y .

Chun-Nam John Yu, Thorsten Joachims Structural SVMs with Latent Variables October 19, 2017 5 / 21

Page 6: Learning Structural SVMs with Latent Variablesdanroth/Teaching/CIS-700-006/slides/lat… · Learning Structural SVMs with Latent Variables ICML, 2009 Tsochantaridis et al. Support

Structured SVM (SSVM)

Given examples D = {xi , yi}li=1. Say xi 2 X . The following applies marginrescaling (Tsochantaridis et al., 2004) to give a smooth, convex upperbound.

Optimization Problem

minw ,⇠

1

2w

Tw + C

X

i

⇠i

such that for 1 i n, 8y 2 Y,

w

T�(xi , yi )� w

T�(xi , y) � �(yi , y)� ⇠i

�(x , y) : feature vector from input x and output y

⇠ : loss to minimize⇠i � 0 : slack, penalizes violation�(yi , y) : controls margin between incorrect predictions y and correct label yi

Chun-Nam John Yu, Thorsten Joachims Structural SVMs with Latent Variables October 19, 2017 6 / 21

Page 7: Learning Structural SVMs with Latent Variablesdanroth/Teaching/CIS-700-006/slides/lat… · Learning Structural SVMs with Latent Variables ICML, 2009 Tsochantaridis et al. Support

Extending the Structured SVM to Latent Variables

Sometimes, (x , y) 2 X ⇥ Y is not su�cient to characterize theinput-output relationship, but also may depend on a set of latent variables(typically unobserved).

How do we enable the structural SVM to handle latent variables?

Notation: let h be a particular variable in a set of latent variables H. hdescribes some structure-determining, unobserved factor.

Things to consider:

Feature representation, loss function

Training objective that is non-convex

Inference techniques and problems

Chun-Nam John Yu, Thorsten Joachims Structural SVMs with Latent Variables October 19, 2017 7 / 21

Page 8: Learning Structural SVMs with Latent Variablesdanroth/Teaching/CIS-700-006/slides/lat… · Learning Structural SVMs with Latent Variables ICML, 2009 Tsochantaridis et al. Support

Prediction Rules for a Latent Structural SVM

Extend the joint feature map �(x , y) to �(x , y , h). The featurevector now captures a relation between some input, some output, andsome latent variable.

We now must perform joint inference over y and h, and we canmutate the prediction rule for some fw (x) as follows:

New Argmax Prediction Rule

fw (x) = (y , h) = argmax(y ,h)2Y⇥H[w · �(x , y , h)]

Chun-Nam John Yu, Thorsten Joachims Structural SVMs with Latent Variables October 19, 2017 8 / 21

Page 9: Learning Structural SVMs with Latent Variablesdanroth/Teaching/CIS-700-006/slides/lat… · Learning Structural SVMs with Latent Variables ICML, 2009 Tsochantaridis et al. Support

Latent Structural SVM Formulation

Optimization Problem for Latent Structural SVM

minw ,⇠

1

2w

Tw + C

nX

i=1

⇠i

such that for 1 i , 8y 2 Y,

maxh2H

[w · �(xi , yi , h)]�maxh2H

[w · �(xi , y , h)] �(yi , y , h)� ⇠i

�(x , y , h) : feature vector from input x , output y , and latent variable h�(yi , y , h) : margin; assumes no dependence on latent h⇠i � 0 : slack, penalizes violation, which now upper bounds the loss

If the latent variable is not present, the model degenerates to astructural SVM

Chun-Nam John Yu, Thorsten Joachims Structural SVMs with Latent Variables October 19, 2017 9 / 21

Page 10: Learning Structural SVMs with Latent Variablesdanroth/Teaching/CIS-700-006/slides/lat… · Learning Structural SVMs with Latent Variables ICML, 2009 Tsochantaridis et al. Support

Prediction Loss with the Addition of Latent Variables

Bound on constraint loss in structural SVM (without latent variable)

�(yi , fw (xi )) convexz }| {

maxy2Y

[w · �(xi , y) +�(yi , y)]�w · �(xi , yi )| {z }linear

= ⇠i

We now need to take the maximum over all latent variables h in H.

Bound on constraint loss in latent structural SVM

�(yi ,fw (xi )) max

(y ,h)2Y⇥H[w · �(xi , y , h) +�(yi , y , h)]

| {z }convex

�maxh2H

[w · �(xi , yi , h)]| {z }

concave

= ⇠i

Chun-Nam John Yu, Thorsten Joachims Structural SVMs with Latent Variables October 19, 2017 10 / 21

Page 11: Learning Structural SVMs with Latent Variablesdanroth/Teaching/CIS-700-006/slides/lat… · Learning Structural SVMs with Latent Variables ICML, 2009 Tsochantaridis et al. Support

Latent Structural SVM Objective Formulation

Attempting to formulate the problem in the dual, a concave constraintremains, as we must compute the maximum over H:

Objective function, with latent variable, dual formulation

minw

convexz }| {1

2w

Tw + C

nX

i=1

max(y ,h)2Y⇥H

[w · �(xi , y , h) +�(yi , y , h)]

�C

nX

i=1

maxh2H

[w · �(xi , yi , h)]�

| {z }concave

Chun-Nam John Yu, Thorsten Joachims Structural SVMs with Latent Variables October 19, 2017 11 / 21

Page 12: Learning Structural SVMs with Latent Variablesdanroth/Teaching/CIS-700-006/slides/lat… · Learning Structural SVMs with Latent Variables ICML, 2009 Tsochantaridis et al. Support

The CCCP Algorithm for Non-Convex Objectives

We have a term with convex and concave parts. How to proceed?

Concave-Convex optimization procedure (Yuille and Rangarajan ’03)

Algorithm:

1 Decompose the objective into a convex and concave part.

2 Upper bound the concave part with a hyperplane.

3 Minimize the resulting convex sum.

4 Iterate on the above until convergence.

Chun-Nam John Yu, Thorsten Joachims Structural SVMs with Latent Variables October 19, 2017 12 / 21

Page 13: Learning Structural SVMs with Latent Variablesdanroth/Teaching/CIS-700-006/slides/lat… · Learning Structural SVMs with Latent Variables ICML, 2009 Tsochantaridis et al. Support

The CCCP Algorithm for Non-Convex Objectives

The Concave-Convex Algorithm:

1 Decompose objective into convex and concave part:

2 Upper bound concave part with a hyperplane:

3 Minimize resulting convex sum (iterate until convergence is reached):

Chun-Nam John Yu, Thorsten Joachims Structural SVMs with Latent Variables October 19, 2017 13 / 21

Page 14: Learning Structural SVMs with Latent Variablesdanroth/Teaching/CIS-700-006/slides/lat… · Learning Structural SVMs with Latent Variables ICML, 2009 Tsochantaridis et al. Support

Applying CCCP to the Objective

We can think of computing the upper bounding hyperplane in the CCCPalgorithm as finding the latent variable that best explains theinput-output pair (xi , yi ). This is equivalent to computing the upperbounding hyperplane on the concave problem of selecting the besth 2 H.

Let h⇤i be that best chosen latent variable from H, equivalently defined as:

”Completing” the latent variables

h⇤i = argmaxh2Hw · �(xi , yi , h)

Chun-Nam John Yu, Thorsten Joachims Structural SVMs with Latent Variables October 19, 2017 14 / 21

Page 15: Learning Structural SVMs with Latent Variablesdanroth/Teaching/CIS-700-006/slides/lat… · Learning Structural SVMs with Latent Variables ICML, 2009 Tsochantaridis et al. Support

Applying CCCP to the Objective

Now, we’ve converted the concave latent variable selection problem into alinear term, and we have a final, convex objective:

Latent structural SVM objective with upper bounding hyperplane

minw

convexz }| {1

2w

Tw + C

nX

i=1

max(y ,h)2Y⇥H

[w · �(xi , y , h) +�(yi , y , h)]

�C

nX

i=1

w · �(xi , yi , h⇤i )�

| {z }linear

From here, we can apply cutting plane algorithms like we can apply to anystructural SVM.

Chun-Nam John Yu, Thorsten Joachims Structural SVMs with Latent Variables October 19, 2017 15 / 21

Page 16: Learning Structural SVMs with Latent Variablesdanroth/Teaching/CIS-700-006/slides/lat… · Learning Structural SVMs with Latent Variables ICML, 2009 Tsochantaridis et al. Support

Latent Structural SVM Summary

Final Optimization Problem

minw ,⇠

1

2w

Tw + C

nX

i=1

⇠i

such that for 1 i , 8y 2 Y,

maxh2H

[w · �(xi , yi , h)]�maxh2H

[w · �(xi , y , h)] �(yi , y , h)� ⇠i

Three primary inference problems overall:

Prediction : argmax(y ,h)2Y⇥Hw · �(xi , y , h)Loss-augmentation : argmax(y ,h)2Y⇥H[w · �(xi , y , h +�(yi , y , h)]

Latent var. determination : argmaxh2Hw · �(xi , yi , h)

Chun-Nam John Yu, Thorsten Joachims Structural SVMs with Latent Variables October 19, 2017 16 / 21

Page 17: Learning Structural SVMs with Latent Variablesdanroth/Teaching/CIS-700-006/slides/lat… · Learning Structural SVMs with Latent Variables ICML, 2009 Tsochantaridis et al. Support

Noun Phrase Coreferencing with Clustering

We can determine a clustering y given an input x with an maximumspanning tree algorithm (Kruskal’s algorithm), where weights for an edge(i , j) can be written as w · xij .Clustering score with latent spanning forest

w · �(x , y , h) =X

(i ,j)2hw · xij

Only consider edges (i , j) that are in the latent spanning forest.

Output the clustering defined by the forest h as y (prediction).

Chun-Nam John Yu, Thorsten Joachims Structural SVMs with Latent Variables October 19, 2017 17 / 21

Page 18: Learning Structural SVMs with Latent Variablesdanroth/Teaching/CIS-700-006/slides/lat… · Learning Structural SVMs with Latent Variables ICML, 2009 Tsochantaridis et al. Support

Noun Phrase Coreferencing with Clustering - Loss

Loss function

�(y , y , h) = n(y)� k(y)�X

(i ,j)2hl(y , (i , j))

n(y) : number of vertices in the correct clustering yk(y) : number of edges in the correct clustering yl(y , (i , j)) : 1 if i and j are same-clustered in y , else -1

Works well, since we can back out h, and can compute loss-augmentedinference with Kruskal’s algorithm. We can also use Kruskal’s algorithm tocomplete h (to choose the optimal, in H.

Chun-Nam John Yu, Thorsten Joachims Structural SVMs with Latent Variables October 19, 2017 18 / 21

Page 19: Learning Structural SVMs with Latent Variablesdanroth/Teaching/CIS-700-006/slides/lat… · Learning Structural SVMs with Latent Variables ICML, 2009 Tsochantaridis et al. Support

Noun Phrase Coreferencing with Clustering - Results

Start with the spanning forest as a linear chain (chronological order);the algorithm then inserts new weights.

Modifications to incorrect-cluster-linking penalty were required(significant decreases: mistakes were over-penalized).

Overall improvement once penalization decreased.

Chun-Nam John Yu, Thorsten Joachims Structural SVMs with Latent Variables October 19, 2017 19 / 21

Page 20: Learning Structural SVMs with Latent Variablesdanroth/Teaching/CIS-700-006/slides/lat… · Learning Structural SVMs with Latent Variables ICML, 2009 Tsochantaridis et al. Support

References

Chun-Nam John Yu, Thorsten Joachims

Learning Structural SVMs with Latent Variables

ICML, 2009

Tsochantaridis et al.

Support Vector Machine Learning for Interdependent and Structured OutputSpaces

ICML, 2004

A. L. Yuille and Anand Rangarajan

The Concave-Convex Procedure (CCCP)

NIPS, 2002

Kai-Wei Chang, Vivek Srikumar, and Dan Roth

Multi-core Structural SVM Training

ECML, 2013

Ming-Wei Chang

Structured Prediction with Indirect Supervision

UIUC, 2011

Chun-Nam John Yu, Thorsten Joachims Structural SVMs with Latent Variables October 19, 2017 20 / 21

Page 21: Learning Structural SVMs with Latent Variablesdanroth/Teaching/CIS-700-006/slides/lat… · Learning Structural SVMs with Latent Variables ICML, 2009 Tsochantaridis et al. Support

Questions?

Chun-Nam John Yu, Thorsten Joachims Structural SVMs with Latent Variables October 19, 2017 21 / 21