Top Banner
Learning Structural SVMs with Latent Variables Xionghao Liu
26

Learning Structural SVMs with Latent Variables

Feb 21, 2016

Download

Documents

laksha

Learning Structural SVMs with Latent Variables. Xionghao Liu. Annotation Mismatch. Action Classification. x. h. Input x. Annotation y. Latent h. y = “jumping”. Desired output during test time is y. Mismatch between desired and available annotations. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Learning Structural SVMs  with Latent Variables

Learning Structural SVMs with Latent Variables

Xionghao Liu

Page 2: Learning Structural SVMs  with Latent Variables

Annotation Mismatch

Input x

Annotation y

Latent h

x

y = “jumping”

h

Action Classification

Mismatch between desired and available annotations

Exact value of latent variable is not “important”

Desired output during test time is y

Page 3: Learning Structural SVMs  with Latent Variables

• Latent SVM

• Optimization

• Practice

• Extensions

Outline – Annotation Mismatch

Andrews et al., NIPS 2001; Smola et al., AISTATS 2005;Felzenszwalb et al., CVPR 2008; Yu and Joachims, ICML 2009

Page 4: Learning Structural SVMs  with Latent Variables

Weakly Supervised Data

Input x

Output y {-1,+1}

Hidden h

x

y = +1

h

Page 5: Learning Structural SVMs  with Latent Variables

Weakly Supervised Classification

Feature Φ(x,h)

Joint Feature Vector

Ψ(x,y,h)

x

y = +1

h

Page 6: Learning Structural SVMs  with Latent Variables

Weakly Supervised Classification

Feature Φ(x,h)

Joint Feature Vector

Ψ(x,+1,h) Φ(x,h)

0=

x

y = +1

h

Page 7: Learning Structural SVMs  with Latent Variables

Weakly Supervised Classification

Feature Φ(x,h)

Joint Feature Vector

Ψ(x,-1,h) 0

Φ(x,h)=

x

y = +1

h

Page 8: Learning Structural SVMs  with Latent Variables

Weakly Supervised Classification

Feature Φ(x,h)

Joint Feature Vector

Ψ(x,y,h)

Score f : Ψ(x,y,h) (-∞, +∞)

Optimize score over all possible y and h

x

y = +1

h

Page 9: Learning Structural SVMs  with Latent Variables

Scoring function

wTΨ(x,y,h)

Prediction

y(w),h(w) = argmaxy,h wTΨ(x,y,h)

Latent SVM

Parameters

Page 10: Learning Structural SVMs  with Latent Variables

Learning Latent SVM

(yi, yi(w))Σi

Empirical risk minimization

minw

No restriction on the loss function

Annotation mismatch

Training data {(xi,yi), i = 1,2,…,n}

Page 11: Learning Structural SVMs  with Latent Variables

Learning Latent SVM

(yi, yi(w))Σi

Empirical risk minimization

minw

Non-convex

Parameters cannot be regularized

Find a regularization-sensitive upper bound

Page 12: Learning Structural SVMs  with Latent Variables

Learning Latent SVM

- wT(xi,yi(w),hi(w))

(yi, yi(w))wT(xi,yi(w),hi(w)) +

Page 13: Learning Structural SVMs  with Latent Variables

Learning Latent SVM

(yi, yi(w))wT(xi,yi(w),hi(w)) +

- maxhi wT(xi,yi,hi)

y(w),h(w) = argmaxy,h wTΨ(x,y,h)

Page 14: Learning Structural SVMs  with Latent Variables

Learning Latent SVM

(yi, y)wT(xi,y,h) +maxy,h

- maxhi wT(xi,yi,hi) ≤ ξi

minw ||w||2 + C Σiξi

Parameters can be regularized

Is this also convex?

Page 15: Learning Structural SVMs  with Latent Variables

Learning Latent SVM

(yi, y)wT(xi,y,h) +maxy,h

- maxhi wT(xi,yi,hi) ≤ ξi

minw ||w||2 + C Σiξi

Convex Convex-

Difference of convex (DC) program

Page 16: Learning Structural SVMs  with Latent Variables

minw ||w||2 + C Σiξi

wTΨ(xi,y,h) + Δ(yi,y) - maxhi wTΨ(xi,yi,hi) ≤ ξi

Scoring function

wTΨ(x,y,h)

Prediction

y(w),h(w) = argmaxy,h wTΨ(x,y,h)

Learning

Recap

Page 17: Learning Structural SVMs  with Latent Variables

• Latent SVM

• Optimization

• Practice

• Extensions

Outline – Annotation Mismatch

Page 18: Learning Structural SVMs  with Latent Variables

Learning Latent SVM

(yi, y)wT(xi,y,h) +maxy,h

- maxhi wT(xi,yi,hi) ≤ ξi

minw ||w||2 + C Σiξi

Difference of convex (DC) program

Page 19: Learning Structural SVMs  with Latent Variables

Concave-Convex Procedure

+

(yi, y)wT(xi,y,h) +

maxy,h

wT(xi,yi,hi)

- maxhi

Linear upper-bound of concave part

Page 20: Learning Structural SVMs  with Latent Variables

Concave-Convex Procedure

+

(yi, y)wT(xi,y,h) +

maxy,h

wT(xi,yi,hi)

- maxhi

Optimize the convex upper bound

Page 21: Learning Structural SVMs  with Latent Variables

Concave-Convex Procedure

+

(yi, y)wT(xi,y,h) +

maxy,h

wT(xi,yi,hi)

- maxhi

Linear upper-bound of concave part

Page 22: Learning Structural SVMs  with Latent Variables

Concave-Convex Procedure

+

(yi, y)wT(xi,y,h) +

maxy,h

wT(xi,yi,hi)

- maxhi

Until Convergence

Page 23: Learning Structural SVMs  with Latent Variables

Concave-Convex Procedure

+

(yi, y)wT(xi,y,h) +

maxy,h

wT(xi,yi,hi)

- maxhi

Linear upper bound?

Page 24: Learning Structural SVMs  with Latent Variables

Linear Upper Bound

- maxhi wT(xi,yi,hi)

-wT(xi,yi,hi*)

hi* = argmaxhi wt

T(xi,yi,hi)

Current estimate = wt

≥ - maxhi wT(xi,yi,hi)

Page 25: Learning Structural SVMs  with Latent Variables

CCCP for Latent SVMStart with an initial estimate w0

Update

Update wt+1 as the ε-optimal solution of

min ||w||2 + C∑i i

wT(xi,yi,hi*) - wT(xi,y,h)≥ (yi, y) - i

hi* = argmaxhiH wtT(xi,yi,hi)

Repeat until convergence

Page 26: Learning Structural SVMs  with Latent Variables

Thanks & QA