Top Banner
Max-Margin Latent Variable Models M. Pawan Kumar
65

Max-Margin Latent Variable Models

Feb 23, 2016

Download

Documents

aulii

Max-Margin Latent Variable Models. M. Pawan Kumar. Max-Margin Latent Variable Models. M. Pawan Kumar. Kevin Miller, Rafi Witten, Tim Tang, Danny Goodman, Haithem Turki , Dan Preston, Dan Selsam , Andrej Karpathy. Ben Packer. Daphne Koller. Computer Vision Data. Log (Size). - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Max-Margin Latent Variable Models

Max-Margin Latent Variable ModelsM. Pawan Kumar

Page 2: Max-Margin Latent Variable Models

Max-Margin Latent Variable ModelsM. Pawan Kumar

Daphne KollerBen Packer

Kevin Miller, Rafi Witten,

Tim Tang, Danny Goodman,

Haithem Turki, Dan Preston,

Dan Selsam, Andrej Karpathy

Page 3: Max-Margin Latent Variable Models

Computer Vision Data

Segmentation

Information

Log

(Size

)

~ 2000

Page 4: Max-Margin Latent Variable Models

Computer Vision Data

Segmentation

Log

(Size

)

Bounding Box

~ 2000~ 12000

Information

Page 5: Max-Margin Latent Variable Models

Computer Vision Data

Segmentation

Log

(Size

)

Bounding BoxImage-Level ~ 2000

~ 12000

> 14 M

“Car” “Chair”Information

Page 6: Max-Margin Latent Variable Models

Computer Vision Data

Segmentation

Log

(Size

)

Bounding BoxImage-Level

Noisy Label~ 2000

~ 12000

> 14 M

> 6 B

Learn with missing information (latent variables)

Information

Page 7: Max-Margin Latent Variable Models

• Two Types of Problems

• Latent SVM (Background)

• Self-Paced Learning

• Max-Margin Min-Entropy Models

• Discussion

Outline

Page 8: Max-Margin Latent Variable Models

Annotation MismatchLearn to classify an image

Image x

Annotation a = “Deer”

Mismatch between desired and available annotations

h

Exact value of latent variable is not “important”

Page 9: Max-Margin Latent Variable Models

Annotation MismatchLearn to classify a DNA sequence

Mismatch between desired and possible annotations

Exact value of latent variable is not “important”

Sequence x

Annotation a {+1, -1}

Latent Variables h

Page 10: Max-Margin Latent Variable Models

Output MismatchLearn to segment an image

Image x Output y

Page 11: Max-Margin Latent Variable Models

Output MismatchLearn to segment an image

Bird

(x, a) (a, h)

Page 12: Max-Margin Latent Variable Models

Output MismatchLearn to segment an image

Mismatch between desired output and available annotations

Exact value of latent variable is important

(x, a) (a, h)

Cow

Page 13: Max-Margin Latent Variable Models

Output MismatchLearn to classify actions

(x, y)

Page 14: Max-Margin Latent Variable Models

Output MismatchLearn to classify actions

+“jumping”

x ha = +1

hb

Page 15: Max-Margin Latent Variable Models

Output MismatchLearn to classify actions

+“jumping”

x ha = -1hb

Mismatch between desired output and available annotations

Exact value of latent variable is important

Page 16: Max-Margin Latent Variable Models

• Two Types of Problems

• Latent SVM (Background)

• Self-Paced Learning

• Max-Margin Min-Entropy Models

• Discussion

Outline

Page 17: Max-Margin Latent Variable Models

Latent SVM

Features (x,a,h)

wT(x,a,h)

Parameters w

Image x

Annotation a = “Deer”

h

Andrews et al, 2001; Smola et al, 2005;Felzenszwalb et al, 2008; Yu and Joachims, 2009

(a(w),h(w)) = maxa,h

Page 18: Max-Margin Latent Variable Models

Parameter Learning

Score ofGround-Truth

>

Score ofAll Other Outputs

Best Completion of

Page 19: Max-Margin Latent Variable Models

Parameter Learning

maxh wT(xi,ai,h)

>

wT(x,a,h)

Page 20: Max-Margin Latent Variable Models

Parameter Learning

maxh wT(xi,ai,h)

wT(x,a,h)

+ Δ(ai,a) - ξi

min ||w||2 + CΣi ξi

Annotation Mismatch

Page 21: Max-Margin Latent Variable Models

Optimization

Update hi* = argmaxh wT(xi,ai,h)

Update w by solving a convex problem

min ||w||2 + C∑i i

wT(xi,ai,hi*) - wT(xi,a,h)≥ (ai, a) - i

Repeat until convergence

Page 22: Max-Margin Latent Variable Models

• Two Types of Problems

• Latent SVM (Background)

• Self-Paced Learning

• Max-Margin Min-Entropy Models

• Discussion

Outline

Page 23: Max-Margin Latent Variable Models

Self-Paced LearningKumar, Packer and Koller, NIPS 2010

1 + 1 = 2

1/3 + 1/6 = 1/2

eiπ+1 = 0

Math is for losers !!

FAILURE … BAD LOCAL MINIMUM

Page 24: Max-Margin Latent Variable Models

Self-Paced LearningKumar, Packer and Koller, NIPS 2010

Euler wasa Genius!!

SUCCESS … GOOD LOCAL MINIMUM

1 + 1 = 2

1/3 + 1/6 = 1/2

eiπ+1 = 0

Page 25: Max-Margin Latent Variable Models

Optimization

Update hi* = argmaxh wT(xi,ai,h)

Update w by solving a convex problem

min ||w||2 + C∑i i

Repeat until convergence

vi

vi {0,1}

λ λμ

- λ∑i vi

wT(xi,ai,hi*) - wT(xi,a,h)≥ (ai, a) - i

Page 26: Max-Margin Latent Variable Models

Image Classification

271 images, 6 classes

90/10 train/test split

5 folds

Mammals Dataset

Page 27: Max-Margin Latent Variable Models

Image Classification

Objective4.4

4.454.5

4.554.6

4.654.7

4.75

Test Error14.5

15

15.5

16

16.5

17

17.5

Kumar, Packer and Koller, NIPS 2010

CCCP

SPL

CCCP

SPL

HOG-Based Model. Dalal and Triggs, 2005

Page 28: Max-Margin Latent Variable Models

Image Classification

~ 5000 images

50/50 train/test split

5 folds

PASCAL VOC 2007 Dataset

Car vs. Not-Car

Page 29: Max-Margin Latent Variable Models

Image ClassificationWitten, Miller, Kumar, Packer and Koller, In Preparation

Objective

HOG + Dense SIFT + Dense Color SIFT

SPL+ – Different features choose different “easy” samples

Page 30: Max-Margin Latent Variable Models

Image ClassificationWitten, Miller, Kumar, Packer and Koller, In Preparation

Mean Average Precision

HOG + Dense SIFT + Dense Color SIFT

SPL+ – Different features choose different “easy” samples

Page 31: Max-Margin Latent Variable Models

Motif Finding

~ 40,000 sequences

50/50 train/test split

5 folds

UniProbe Dataset

Binding vs. Not-Binding

Page 32: Max-Margin Latent Variable Models

Motif Finding

Objective0

20406080

100120140

Test Error282930313233343536

Kumar, Packer and Koller, NIPS 2010

CCCP

SPL

CCCP

SPL

Motif + Markov Background Model. Yu and Joachims, 2009

Page 33: Max-Margin Latent Variable Models

Semantic Segmentation

+

Train - 572 imagesValidation - 53 images

Test - 90 images

Train - 1274 imagesValidation - 225 images

Test - 750 images

Stanford BackgroundVOC Segmentation 2009

Page 34: Max-Margin Latent Variable Models

Semantic SegmentationImageNetVOC Detection 2009

+

Train - 1564 images Train - 1000 imagesBounding Box Data Image-Level Data

Page 35: Max-Margin Latent Variable Models

Semantic SegmentationKumar, Turki, Preston and Koller, ICCV 2011

VOC Overlap222324252627282930

SBD Overlap52

52.553

53.554

54.555

55.5

SUP CCCP

SPL

SUPCCCP

SPL

Region-based Model. Gould, Fulton and Koller, 2009

SUP – Supervised Learning (Segmentation Data Only)

Page 36: Max-Margin Latent Variable Models

Action ClassificationPASCAL VOC 2011

Train – 3000 instances Train - 10000 imagesBounding Box Data Noisy Data

+

Test – 3000 instances

Page 37: Max-Margin Latent Variable Models

Action ClassificationPacker, Kumar, Tang and Koller, In Preparation

Mean Average Precision60.8

6161.261.461.661.8

6262.262.462.662.8

SUP

CCCP

SPL

Poselet-based Model. Maji, Bourdev and Malik, 2011

Page 38: Max-Margin Latent Variable Models

Self-Paced Multiple Kernel LearningKumar, Packer and Koller, In Preparation

1 + 1 = 2

1/3 + 1/6 = 1/2

eiπ+1 = 0

Integers

RationalNumbers

ImaginaryNumbers

USE A FIXED MODEL

Page 39: Max-Margin Latent Variable Models

Kumar, Packer and Koller, In Preparation

1 + 1 = 2

1/3 + 1/6 = 1/2

eiπ+1 = 0

Integers

RationalNumbers

ImaginaryNumbers

ADAPT THE MODEL COMPLEXITY

Self-Paced Multiple Kernel Learning

Page 40: Max-Margin Latent Variable Models

Optimization

Update hi* = argmaxh wT(xi,ai,h)

Update w by solving a convex problem

min ||w||2 + C∑i i

Repeat until convergence

vi

vi {0,1}

λ λμ

- λ∑i vi

wT(xi,ai,hi*) - wT(xi,a,h)≥ (ai, a) - i

Kij = (xi,ai,hi)T (xj,aj,hj) K = Σk ck Kk

^

and c

Page 41: Max-Margin Latent Variable Models

Image Classification

271 images, 6 classes

90/10 train/test split

5 folds

Mammals Dataset

Page 42: Max-Margin Latent Variable Models

Image Classification

Objective0

0.2

0.4

0.6

0.8

1

Test Error02468

1012141618

Kumar, Packer and Koller, In Preparation

FIXED

SPMKL

FIXED

SPMKL

HOG-Based Model. Dalal and Triggs, 2005

Page 43: Max-Margin Latent Variable Models

Motif Finding

~ 40,000 sequences

50/50 train/test split

5 folds

UniProbe Dataset

Binding vs. Not-Binding

Page 44: Max-Margin Latent Variable Models

Motif Finding

Objective69707172737475767778

Test Error8.5

9

9.5

10

10.5

11

11.5

Kumar, Packer and Koller, NIPS 2010

FIXED

SPMKL

FIXED

SPMKL

Motif + Markov Background Model. Yu and Joachims, 2009

Page 45: Max-Margin Latent Variable Models

• Two Types of Problems

• Latent SVM (Background)

• Self-Paced Learning

• Max-Margin Min-Entropy Models

• Discussion

Outline

Page 46: Max-Margin Latent Variable Models

0.00 0.00 0.250.00 0.25 0.000.00 0.00 0.25

Pr(a,h|x) = exp( wT(x,a,h))Z(x)

Pr(a1,h|x)

MAP Inference

Page 47: Max-Margin Latent Variable Models

0.00 0.00 0.250.00 0.25 0.000.00 0.00 0.25

Pr(a1,h|x)0.00 0.00 0.010.00 0.24 0.000.00 0.00 0.00

Pr(a2,h|x)

MAP Inference

mina,h – log (Pr(a,h|x))

Value of latent variable?

Pr(a,h|x) = exp( wT(x,a,h))Z(x)

Page 48: Max-Margin Latent Variable Models

mina – log (Pr(a|x))

Min-Entropy Inference

+ Hα (Pr(h|a,x))

mina Hα(Q(a; x, w))

Q(a; x, w) = Set of all {Pr(a,h|x)}

Renyi entropy of generalized distribution

Page 49: Max-Margin Latent Variable Models

min ||w||2 + C∑i i

Hα(Q(a; x, w))- Hα(Q(ai; x, w)) ≥ (ai, a) - i

i ≥ 0

Like latent SVM, minimizes (ai, ai(w))

In fact, when α = ∞...

Max-Margin Min-Entropy ModelsMiller, Kumar, Packer, Goodman and Koller, AISTATS 2012

Page 50: Max-Margin Latent Variable Models

min ||w||2 + C∑i i

maxhwT(x,ai,h)-maxhwT(x,a,h) ≥ (ai, a) - i

i ≥ 0

In fact, when α = ∞... Latent SVM

Max-Margin Min-Entropy Models

Like latent SVM, minimizes (ai, ai(w))

Miller, Kumar, Packer, Goodman and Koller, AISTATS 2012

Page 51: Max-Margin Latent Variable Models

Image Classification

271 images, 6 classes

90/10 train/test split

5 folds

Mammals Dataset

Page 52: Max-Margin Latent Variable Models

Image ClassificationMiller, Kumar, Packer, Goodman and Koller, AISTATS 2012

HOG-Based Model. Dalal and Triggs, 2005

Page 53: Max-Margin Latent Variable Models

Image ClassificationMiller, Kumar, Packer, Goodman and Koller, AISTATS 2012

HOG-Based Model. Dalal and Triggs, 2005

Page 54: Max-Margin Latent Variable Models

Image ClassificationMiller, Kumar, Packer, Goodman and Koller, AISTATS 2012

HOG-Based Model. Dalal and Triggs, 2005

Page 55: Max-Margin Latent Variable Models

Motif Finding

~ 40,000 sequences

50/50 train/test split

5 folds

UniProbe Dataset

Binding vs. Not-Binding

Page 56: Max-Margin Latent Variable Models

Motif FindingMiller, Kumar, Packer, Goodman and Koller, AISTATS 2012

Motif + Markov Background Model. Yu and Joachims, 2009

Page 57: Max-Margin Latent Variable Models

• Two Types of Problems

• Latent SVM (Background)

• Self-Paced Learning

• Max-Margin Min-Entropy Models

• Discussion

Outline

Page 58: Max-Margin Latent Variable Models

Very Large Datasets

• Initialize parameters using supervised data

• Impute latent variables (inference)

• Select easy samples (very efficient)

• Update parameters using incremental SVM

• Refine efficiently with proximal regularization

Page 59: Max-Margin Latent Variable Models

Output MismatchΔ(a,h,a(w),h(w))Σh Prθ(h|a,x) + A(θ)

C. R. Rao’s Relative Quadratic Entropy

Minimize over w and θ

Page 60: Max-Margin Latent Variable Models

Output MismatchΔ(a,h,a(w),h(w))Σh Prθ(h|a,x) + A(θ)

C. R. Rao’s Relative Quadratic Entropy

Minimize over w

(a1,h) (a2,h)

Pr θ

(h,a

|x)

Page 61: Max-Margin Latent Variable Models

Output MismatchΔ(a,h,a(w),h(w))Σh Prθ(h|a,x) + A(θ)

C. R. Rao’s Relative Quadratic Entropy

Minimize over w

(a1,h)

Pr θ

(h,a

|x)

(a2,h)

Page 62: Max-Margin Latent Variable Models

Output MismatchΔ(a,h,a(w),h(w))Σh Prθ(h|a,x) + A(θ)

C. R. Rao’s Relative Quadratic Entropy

Minimize over θ

(a1,h) (a2,h)

Pr θ

(h,a

|x)

Page 63: Max-Margin Latent Variable Models

Output MismatchΔ(a,h,a(w),h(w))Σh Prθ(h|a,x) + A(θ)

C. R. Rao’s Relative Quadratic Entropy

Minimize over θ

(a1,h) (a2,h)

Pr θ

(h,a

|x)

Page 64: Max-Margin Latent Variable Models

Output MismatchΔ(a,h,a(w),h(w))Σh Prθ(h|a,x) + A(θ)

C. R. Rao’s Relative Quadratic Entropy

Minimize over θ

(a1,h) (a2,h)

Pr θ

(h,a

|x)

Page 65: Max-Margin Latent Variable Models

Questions?