Top Banner
Visual Parsing with Weak Supervision Jia Xu Department of Computer Sciences University of Wisconsin-Madison 2015-07-30
100

Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Feb 28, 2018

Download

Documents

doandang
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Visual Parsing with Weak Supervision

Jia Xu

Department of Computer SciencesUniversity of Wisconsin-Madison

2015-07-30

Page 2: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Research Goal

Teach Computer to See at/beyond Human Level

Interpret/summarize/organize visual data on the InternetHelp the disabled population (e.g., the blind)

Page 3: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Visual Parsing

Fundamental TaskSemantically parse every pixel in images and videos

First step towards high level applications

Self-driving Car Unmanned Aerial Vehicle Wearable Glasses

Page 4: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Visual Parsing

Fundamental TaskSemantically parse every pixel in images and videosFirst step towards high level applications

Self-driving Car Unmanned Aerial Vehicle Wearable Glasses

Page 5: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Visual Parsing

Fundamental Task

Turning Visual Data Into Knowledge

Everyday > 3.5 million > 300 million > 150, 000 hours

Never Ending Language Learning (Mitchell et al., 2009)Never Ending Image Learner (Chen et al., 2013)

Page 6: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Challenges

Modern Image Dataset

Noisy Label Image-Level Bounding Box Segmentation

Noisy Label

Image-Level

Bounding Box

Segmentation

> 6 Billion

> 14 Million

∼ 1 Million

∼ 5000

Information

Log(

Size

)

1

Much fewer segmentations are annotated for videos!

Page 7: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Challenges

Modern Image Dataset

Noisy Label Image-Level Bounding Box Segmentation

Noisy Label

Image-Level

Bounding Box

Segmentation

> 6 Billion

> 14 Million

∼ 1 Million

∼ 5000

Information

Log(

Size

)

1

Much fewer segmentations are annotated for videos!

Page 8: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Motivation

Bottleneck of Fully Supervised MethodsFull annotation is expensive to collect and limited at size

Why Weakly Supervised LearningWeak supervision is easier to obtain: e.g., gazeLarge datasets with side/weak annotations are readilyavailable: metadata, tags, textVisual data presents the physical world: shape, geometry,context

Page 9: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Motivation

Bottleneck of Fully Supervised MethodsFull annotation is expensive to collect and limited at size

Why Weakly Supervised LearningWeak supervision is easier to obtain: e.g., gaze

Large datasets with side/weak annotations are readilyavailable: metadata, tags, textVisual data presents the physical world: shape, geometry,context

Page 10: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Motivation

Bottleneck of Fully Supervised MethodsFull annotation is expensive to collect and limited at size

Why Weakly Supervised LearningWeak supervision is easier to obtain: e.g., gazeLarge datasets with side/weak annotations are readilyavailable: metadata, tags, text

Visual data presents the physical world: shape, geometry,context

Page 11: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Motivation

Bottleneck of Fully Supervised MethodsFull annotation is expensive to collect and limited at size

Why Weakly Supervised LearningWeak supervision is easier to obtain: e.g., gazeLarge datasets with side/weak annotations are readilyavailable: metadata, tags, textVisual data presents the physical world: shape, geometry,context

Page 12: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

My Thesis Research

How can we utilize weakly labeled data effectively for thevisual parsing task?When human comes into the visual parsing loop, how canwe minimize user effort while still achieving satisfactoryparsing results?

Page 13: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Roadmap

Chapter Parsing Task Weak Supervision Publication

Ch. 2 Object Segmentation User Indication CVPR 2013

Ch. 3 Scene Parsing Image-level Tags CVPR 2014

Image-level TagsCh. 4 Scene Parsing Bounding Boxes CVPR 2015a

Partial Labels

Ch. 5 Video Segmentation Side Knowledge ICCV 2013

Ch. 6 Video Summarization Human Gaze CVPR 2015b

Page 14: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Roadmap

Chapter Parsing Task Weak Supervision Publication

Ch. 2 Object Segmentation User Indication CVPR 2013

Ch. 3 Scene Parsing Image-level Tags CVPR 2014

Image-level TagsCh. 4 Scene Parsing Bounding Boxes CVPR 2015a

Partial Labels

Ch. 5 Video Segmentation Side Knowledge ICCV 2013

Ch. 6 Video Summarization Human Gaze CVPR 2015b

Page 15: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Object Segmentation

Main Challenges1 Semantic gap: what is an object?2 Ambiguity of user intention: which object do you want?

Page 16: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Object Segmentation

Main Challenges1 Semantic gap: what is an object?

2 Ambiguity of user intention: which object do you want?

Page 17: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Object Segmentation

Main Challenges1 Semantic gap: what is an object?2 Ambiguity of user intention: which object do you want?

Page 18: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Interactive Object Segmentation

Main Challenges1 Semantic gap: what is an object?2 Ambiguity of user intention: which object do you want?

A few user scribbles can make segmentation much easier!

Page 19: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Related work

Region-based: Graphcut (Boykov and Jolly, 2001), Grabcut(Rother et al., 2004), Random Walks (Grady, 2006),Geodesic Shortest Path (Bai and Sapiro, 2009), GeodesicStar Convexity (Gulshan et al., 2010)Edge-based: Intelligent Scissors (Mortensen and Barrett,1998), LabelMe (Russell et al., 2008)

GraphCut GrabCut Intelligent Scissors LabelMe

Page 20: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Our Ideas (EulerSeg)

ObjectiveModeling topological constraint while concurrently finding oneor more minimum energy closed contours which satisfy:

Foreground seeds must be “inside”Background seeds must be “outside”

[X., Collins, Singh, CVPR 2013]

Page 21: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Our Ideas (EulerSeg)

Main Advantages1 Basic primitives are edgelets

(Little dependence on # of pixels)

2 Dense strokes not needed to learn appearance model.Results do NOT vary with seed location(Interaction constraints are completely geometric in form)

3 Incorporating connectedness priors and specifying # ofclosures are easy (Euler characteristic)

Page 22: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Our Ideas (EulerSeg)

Main Advantages1 Basic primitives are edgelets

(Little dependence on # of pixels)2 Dense strokes not needed to learn appearance model.

Results do NOT vary with seed location(Interaction constraints are completely geometric in form)

3 Incorporating connectedness priors and specifying # ofclosures are easy (Euler characteristic)

Page 23: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Our Ideas (EulerSeg)

Main Advantages1 Basic primitives are edgelets

(Little dependence on # of pixels)2 Dense strokes not needed to learn appearance model.

Results do NOT vary with seed location(Interaction constraints are completely geometric in form)

3 Incorporating connectedness priors and specifying # ofclosures are easy (Euler characteristic)

Page 24: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Graph Representation

x: face indicator vectory: edge indicator vectorz: vertex indicator vectorw: indicator vector for foreground boundary edges. Internaledges yi 6= wi = 0 are black, while boundary edgesyi = wi = 1 are red

Page 25: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Graph Representation

x: face indicator vectory: edge indicator vectorz: vertex indicator vectorw: indicator vector for foreground boundary edges. Internaledges yi 6= wi = 0 are black, while boundary edgesyi = wi = 1 are red

Page 26: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Discrete Calculus

Vertex Edge Face Coherent Anti-coherent

Cell Orientation

Vertex-edge Incidence Matrix: A1 = A,A2 = A1./D

Avk,eij =

1 k = i, j0 otherwise

[Grady and Polimeni, 2010]

Page 27: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Discrete Calculus

Vertex Edge Face Coherent Anti-coherent

Cell Orientation

Vertex-edge Incidence Matrix: A1 = A,A2 = A1./D

Avk,eij =

1 k = i, j0 otherwise

[Grady and Polimeni, 2010]

Page 28: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Discrete Calculus

Vertex Edge Face Coherent Anti-coherent

Cell Orientation

Edge-face Incidence Matrix: C1 = C,C2 = |C|

Ce,f =

+1 e is incident to f and coherently oriented−1 e is incident to f and anti-coherently oriented0 otherwise

[Grady and Polimeni, 2010]

Page 29: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

An Example

f1f2

f3

e1

e2

e3

e4

e5

e6

e7v1

v2

v3

v4

v5

1

f1f2

f3

e1

e2

e3

e4

e5

e6

e7v1

v2

v3

v4

v5

1

C =

1 0 0−1 0 01 −1 00 1 00 −1 10 0 −10 0 1

x =

110

b = Cx =

1−101−100

Page 30: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Euler Characteristic

f1f2

f3

e1

e2

e3

e4

e5

e6

e7v1

v2

v3

v4

v5

1

Number of faces (1Tx):

2Number of nodes (1Tz): 4Number of edges (1Ty): 5Number of connected components (1Tx + 1Tz− 1Ty): 1

Page 31: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Euler Characteristic

f1f2

f3

e1

e2

e3

e4

e5

e6

e7v1

v2

v3

v4

v5

1

Number of faces (1Tx): 2Number of nodes (1Tz):

4Number of edges (1Ty): 5Number of connected components (1Tx + 1Tz− 1Ty): 1

Page 32: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Euler Characteristic

f1f2

f3

e1

e2

e3

e4

e5

e6

e7v1

v2

v3

v4

v5

1

Number of faces (1Tx): 2Number of nodes (1Tz): 4Number of edges (1Ty):

5Number of connected components (1Tx + 1Tz− 1Ty): 1

Page 33: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Euler Characteristic

f1f2

f3

e1

e2

e3

e4

e5

e6

e7v1

v2

v3

v4

v5

1

Number of faces (1Tx): 2Number of nodes (1Tz): 4Number of edges (1Ty): 5Number of connected components (1Tx + 1Tz− 1Ty):

1

Page 34: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Euler Characteristic

f1f2

f3

e1

e2

e3

e4

e5

e6

e7v1

v2

v3

v4

v5

1

Number of faces (1Tx): 2Number of nodes (1Tz): 4Number of edges (1Ty): 5Number of connected components (1Tx + 1Tz− 1Ty): 1

Page 35: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Problem Formulation

Optimization Model

minw,x,y,z

f (w)

s.t. w = |C1x|, 2y = w + C2x,

A2y ≤ z ≤ A1y, 1Tx + 1Tz− 1Ty = n,

x1 ≤ x ≤ 1− x0, wi, xj, yk, zl ∈ 0, 1.

Page 36: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Ratio Objective

Input Solution 1 Solution 2 Solution 3

NTw = 38.48 NTw = 164.77 NTw = 389.61

DTw = 52 DTw = 288 DTw = 865NT wDT w = 0.5721 NT w

DT w = 0.7400 NT wDT w = 0.4504

Page 37: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Ratio Objective

Input Solution 1 Solution 2 Solution 3

NTw = 38.48 NTw = 164.77 NTw = 389.61DTw = 52 DTw = 288 DTw = 865

NT wDT w = 0.5721 NT w

DT w = 0.7400 NT wDT w = 0.4504

Page 38: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Problem Formulation

Optimization Model

minw,x,y,z

NTwDTw

s.t. w = |C1x|, 2y = w + C2x,

A2y ≤ z ≤ A1y, 1Tx + 1Tz− 1Ty = n,

x1 ≤ x ≤ 1− x0, wi, xj, yk, zl ∈ 0, 1.

Page 39: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Minimizing a Ratio Cost

Solved by minimizing

ψ(t,w) = (N− tD)Tw

Over feasible w for a sequence of chosen values of t

With an initial finite bounding interval [tl, tu]

Pick t0 = tl+tu2 , and let

w = arg minwψ(t0,w)

ψ(t0, w) = 0: NTw/DTw = t0, terminate with solution t0ψ(t0, w) < 0: NTw/DTw < t0, tu ← NTw/DTwψ(t0, w) > 0: NTw/DTw > t0, tl ← t0

Page 40: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Minimizing a Ratio Cost

Solved by minimizing

ψ(t,w) = (N− tD)Tw

Over feasible w for a sequence of chosen values of t

With an initial finite bounding interval [tl, tu]

Pick t0 = tl+tu2 , and let

w = arg minwψ(t0,w)

ψ(t0, w) = 0: NTw/DTw = t0, terminate with solution t0ψ(t0, w) < 0: NTw/DTw < t0, tu ← NTw/DTwψ(t0, w) > 0: NTw/DTw > t0, tl ← t0

Page 41: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Qualitative ResultsOriginal Truth BJ SP RW GSCseq EulerSeg EulerSeg-0

Page 42: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Quantitative Evaluation

F-Measure

P =|A ∩ T||A|

, R =|A ∩ T||T|

, F =2PR

P + R

How much effort to reach F = 0.95 (using a robot user)?

Method BJ RW SP GSCseq EulerSegUser Scribbles 5.51 6.48 4.54 2.30 2.06

Seeds tell MORE than link/cannot link

[Gulshan et al., 2010]

Page 43: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Roadmap

Chapter Parsing Task Weak Supervision Publication

Ch. 2 Object Segmentation User Indication CVPR 2013

Ch. 3 Scene Parsing Image-level Tags CVPR 2014

Image-level TagsCh. 4 Scene Parsing Bounding Boxes CVPR 2015a

Partial Labels

Ch. 5 Video Segmentation Side Knowledge ICCV 2013

Ch. 6 Video Summarization Human Gaze CVPR 2015b

Page 44: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Semantic Segmentation

Building Tree Boat Person

Bad Object Labels

Page 45: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Semantic Segmentation

Building Tree Boat Person

Bad Object Labels

Page 46: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Weakly Supervised Semantic Segmentation

MotivationAnnotation: presence of image classesTags readily available in online photo collectionsEasier to obtain than segmentations

sky, building, tree

sky

building

tree tree

[X., Schwing, Urtasun, CVPR 2014]

Page 47: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Cosegmentation

Concurrently segment common foreground objects from a setof images

[Collins, X., Grady, Singh, CVPR 2012][Mukherjee, Singh, X., Collins, ECCV 2012][Collins, Liu, X., Mukherjee, Singh, ECCV 2014]

Page 48: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Latent Structured Prediction

Graphical Model

Presence/absence of a class: yi ∈ 0, 1Semantic superpixel label: hj ∈ 1, . . . ,CImage evidence: x

y1 y2

x

yC

h1

x1

h2

x2

hN

xN

· · ·

· · ·

· · ·

1

y1 y2

x

yC

h1

x1

h2

x2

hN

xN

· · ·

· · ·

· · ·

1

Learning/Inference with Tags Inference without tags

[X., Schwing, Urtasun, CVPR 2014]

Page 49: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

How About Other Forms of Weak Supervision

Tag Bounding Box Partial Label

Sky

Boat

SeaPerson

Unified Model

minW,H

12

tr(WTW) + λ

n∑p=1

ξ(W; xp,hp)

s.t. H1C = 1n,H ∈ 0, 1n×C

H ∈ S

[X., Schwing, Urtasun, CVPR, 2015]

Page 50: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

How About Other Forms of Weak Supervision

Tag Bounding Box Partial Label

Sky

Boat

SeaPerson

Unified Model

minW,H

12

tr(WTW) + λ

n∑p=1

ξ(W; xp,hp)

s.t. H1C = 1n,H ∈ 0, 1n×C

H ∈ S

[X., Schwing, Urtasun, CVPR, 2015]

Page 51: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Max-Margin Objective

DenoteX = [xT

1 , xTp , · · · , xT

n ] ∈ Rn×d: feature matrixH = [hT

1 ,hTp , · · · ,hT

n ] ∈ 0, 1n×c: hidden label matrixW ∈ Rd×c: feature weighting matrix

minW,H

12

tr(WTW) + λ

n∑p=1

C∑c=1

ξ(wc; xp, hcp)

where

ξ(wc; xp, hcp) =

max(0, 1 + (wT

c xp)), hcp = 0

µc max(0, 1− (wTc xp)), hc

p = 1

µc =

∑np=1 1(hc

p == 0)∑np=1 1(hc

p == 1)

[Zhao et al., 2008, Zhao et al., 2009 ]

Page 52: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Max-Margin Objective

DenoteX = [xT

1 , xTp , · · · , xT

n ] ∈ Rn×d: feature matrixH = [hT

1 ,hTp , · · · ,hT

n ] ∈ 0, 1n×c: hidden label matrixW ∈ Rd×c: feature weighting matrix

minW,H

12

tr(WTW) + λ

n∑p=1

C∑c=1

ξ(wc; xp, hcp)

where

ξ(wc; xp, hcp) =

max(0, 1 + (wT

c xp)), hcp = 0

µc max(0, 1− (wTc xp)), hc

p = 1

µc =

∑np=1 1(hc

p == 0)∑np=1 1(hc

p == 1)

[Zhao et al., 2008, Zhao et al., 2009 ]

Page 53: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Supervision Space as Constraints

Unlabeled/Cosegmentation/Transductive: S = ∅Image level tags: S = H ≤ BZ,BTH ≥ ZBounding boxes: S = H ≤ BZ, BTH ≥ ZSemi-supervision S = HΩ = HΩ

An Example (2 images, 5 superpixels (2+3), 3 classes)

B =

1 01 00 10 10 1

, Z =

[1 1 00 1 1

], H =

0 1 01 0 00 0 10 1 00 0 1

H ≤ BZ =

1 1 01 1 00 1 10 1 10 1 1

, BTH =

[1 1 00 1 2

]≥ Z

Page 54: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Supervision Space as Constraints

Unlabeled/Cosegmentation/Transductive: S = ∅Image level tags: S = H ≤ BZ,BTH ≥ ZBounding boxes: S = H ≤ BZ, BTH ≥ ZSemi-supervision S = HΩ = HΩ

An Example (2 images, 5 superpixels (2+3), 3 classes)

B =

1 01 00 10 10 1

, Z =

[1 1 00 1 1

], H =

0 1 01 0 00 0 10 1 00 0 1

H ≤ BZ =

1 1 01 1 00 1 10 1 10 1 1

, BTH =

[1 1 00 1 2

]≥ Z

Page 55: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Optimization Model

minW,H

12

tr(WTW) + λ

n∑p=1

ξ(W; xp,hp)

s.t. H1C = 1n,H ∈ 0, 1n×C

H ∈ S

ObservationsChallenge: non-convex mixed integer programmingOptimization problem is bi-convex, i.e., it is convex w.r.t. Wif H is fixed, and convex w.r.t. H if W is fixedConstraints are linear and they only involve the super-pixelassignment matrix H

Page 56: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Optimization Model

minW,H

12

tr(WTW) + λ

n∑p=1

ξ(W; xp,hp)

s.t. H1C = 1n,H ∈ 0, 1n×C

H ∈ S

ObservationsChallenge: non-convex mixed integer programming

Optimization problem is bi-convex, i.e., it is convex w.r.t. Wif H is fixed, and convex w.r.t. H if W is fixedConstraints are linear and they only involve the super-pixelassignment matrix H

Page 57: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Optimization Model

minW,H

12

tr(WTW) + λ

n∑p=1

ξ(W; xp,hp)

s.t. H1C = 1n,H ∈ 0, 1n×C

H ∈ S

ObservationsChallenge: non-convex mixed integer programmingOptimization problem is bi-convex, i.e., it is convex w.r.t. Wif H is fixed, and convex w.r.t. H if W is fixedConstraints are linear and they only involve the super-pixelassignment matrix H

Page 58: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Learning Algorithm

minW,H

12

tr(WTW) + λ

n∑p=1

ξ(W; xp,hp)

s.t. H1C = 1n,H ∈ 0, 1n×C

H ∈ S

Alternating BetweenFix H solve for W independent of classes (1-vs-all linearSVM)Fix W infer super-pixel labels H in parallel w.r.t images(small LP instances)

Page 59: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Learning Algorithm

Alternating BetweenFix H solve for W independent of classes (1-vs-all linearSVM)Fix W infer super-pixel labels H in parallel w.r.t images(small LP instances)

Inference

maxH

tr((XW)TH)

s.t. H1C = 1n,H ∈ 0, 1n×C,

H ∈ S

PropositionFixing W solving for H using a linear program gives the integraloptimal solution.

Page 60: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Learning Algorithm

Alternating BetweenFix H solve for W independent of classes (1-vs-all linearSVM)Fix W infer super-pixel labels H in parallel w.r.t images(small LP instances)

Inference

maxH

tr((XW)TH)

s.t. H1C = 1n,H ∈ 0, 1n×C,

H ∈ S

PropositionFixing W solving for H using a linear program gives the integraloptimal solution.

Page 61: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Theoretical Guarantee

PropositionFixing W solving for H using a linear program gives the integraloptimal solution.

Proof.(Sketch) The main idea of our proof is to show our coefficientmatrix is totally unimodular. By Grady 2010: If A is totallyunimodular and b is integral, then linear programs of forms likemin cTx | Ax = b, x ≥ 0 have integral optima, for any c. Hence,the LP relaxation gives the optimal integral solution.

Page 62: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Computation Efficiency

Model NatureDecomposableParallelizableTheoretical guarantee of relaxation quality

Running timeorders of magnitude faster than the state-of-the-art (20 minv.s. 24 hours)10 ms to test one image

Page 63: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Computation Efficiency

Model NatureDecomposableParallelizableTheoretical guarantee of relaxation quality

Running timeorders of magnitude faster than the state-of-the-art (20 minv.s. 24 hours)10 ms to test one image

Page 64: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Experimental Evaluation

DatasetsSIFT-Flow (a.k.a, LabelMe): 2688 images, 33 classesMSRC: 591 images, 21 classes

Accuracy MetricPer-pixel: the fraction of the number of pixels classifiedrightly over the number of pixels to be classified in totalPer-class: the average of accuracy of all the classes

Page 65: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Comparison to State-of-the-art on Sift-Flow

Method Supervision Per-class Per-pixelLiu et al., 2011 (PAMI) full 24 76.7Farabet et al., 2012 (ICML) full 29.5 78.5Farabet et al., 2012 (ICML) balanced full 46.0 74.2Eigen et al., 2012 (CVPR) full 32.5 77.1Singh et al., 2013 (CVPR) full 33.8 79.2Tighe et al., 2013 (IJCV) full 30.1 77.0Tighe et al., 2014 (CVPR) full 39.3 78.6Yang et al., 2014 (CVPR) full 48.7 79.8Vezhnevets et al., 2011 (ICCV) weak (tags) 14 N/AVezhnevets et al., 2012 (CVPR) weak (tags) 22 51Xu et al., 2014 (CVPR) weak (tags) 27.9 N/AOurs (1-vs-all) weak (tags) 32.0 64.4Ours (ILT) weak (tags) 35.0 65.0Ours (1-vs-all + transductive) weak (tags) 40.0 59.0Ours (ILT + transductive) weak (tags) 41.4 62.7

Page 66: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Comparison to State-of-the-art on MSRC

Method Supervision per-class per-pixelShotton et al., 2008 (ECCV) full 67 72Yao et al., 2012 (CVPR) full 79 86Vezhnevets et al., 2011 (ICCV) weak (tags) 67 67Liu et al., 2012 (TMM) weak (tags) N/A 71Ours weak (tags) 73 70

Page 67: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Sample Results

Input Truth Ours Input Truth Ours

unlabeled sky mountain road tree car sign person field building

1

Page 68: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Sample Results (continued)

Input Truth Ours Input Truth Ours

unlabeled sky mountain road tree car sign person field building

1

Page 69: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Other Forms of Weak Supervision

Semi-supervision

0 0.1 0.2 0.3 0.4 0.531

32

33

34

35

36

37

38

39

Superpixel label sample ratio

Per

−cl

ass

accu

racy

(%

)

0 0.1 0.2 0.3 0.4 0.564

66

68

70

72

74

Superpixel label sample ratio

Per

−pi

xel a

ccur

acy

(%)

Bounding Box

0 0.1 0.2 0.3 0.4 0.531

32

33

34

35

36

Box sample ratio

Per

−cl

ass

accu

racy

(%

)

0 0.1 0.2 0.3 0.4 0.564.4

64.6

64.8

65

65.2

65.4

65.6

65.8

66

Box sample ratio

Per

−pi

xel a

ccur

acy

(%)

Page 70: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Other Forms of Weak Supervision

Semi-supervision

0 0.1 0.2 0.3 0.4 0.531

32

33

34

35

36

37

38

39

Superpixel label sample ratio

Per

−cl

ass

accu

racy

(%

)

0 0.1 0.2 0.3 0.4 0.564

66

68

70

72

74

Superpixel label sample ratio

Per

−pi

xel a

ccur

acy

(%)

Bounding Box

0 0.1 0.2 0.3 0.4 0.531

32

33

34

35

36

Box sample ratio

Per

−cl

ass

accu

racy

(%

)

0 0.1 0.2 0.3 0.4 0.564.4

64.6

64.8

65

65.2

65.4

65.6

65.8

66

Box sample ratio

Per

−pi

xel a

ccur

acy

(%)

Page 71: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Roadmap

Chapter Parsing Task Weak Supervision Publication

Ch. 2 Object Segmentation User Indication CVPR 2013

Ch. 3 Scene Parsing Image-level Tags CVPR 2014

Image-level TagsCh. 4 Scene Parsing Bounding Boxes CVPR 2015a

Partial Labels

Ch. 5 Video Segmentation Side Knowledge ICCV 2013

Ch. 6 Video Summarization Human Gaze CVPR 2015b

Page 72: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Online Video Segmentation

Background subspace is modeled on a Grassmannianmanifold with online updating along the geodesicSpatially contiguous and structured foreground is modeledvia group sparsity

Input Background Foreground

[X., Ithapu, Mukherjee, Rehg, Singh, ICCV 2013]

Page 73: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

First Person Vision

MotivationLife-logging with wearable cameras: SenseCam, GoPro,Google glassMemory aidGaze provides a form of weak supervision: window of mind

Page 74: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Gaze-enabled Egocentric Video Summarization

··· ··· ··· ···

Video Summarization

1:00PM 2:00PM 3:00PM 4:00PM 5:00PM

What makes a good summary?RelevanceDiversityCompactnessPersonalization

[X., Mukherjee, Li, Warnewr, Rehg, Singh, CVPR, 2015]

Page 75: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Gaze-enabled Egocentric Video Summarization

··· ··· ··· ···

Video Summarization

1:00PM 2:00PM 3:00PM 4:00PM 5:00PM

What makes a good summary?RelevanceDiversityCompactnessPersonalization

[X., Mukherjee, Li, Warnewr, Rehg, Singh, CVPR, 2015]

Page 76: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Relevance and Diversity Measurement

Mutual Information

M(V\S;S) = H(V\S)− H(V\S|S)

= H(V\S) + H(S)− H(V)

Entropy

H(S) =1 + log(2π)

2|S|+ 1

2log(det(LS))

Maximizing

M(S) =12

log(det(LV\S)) +12

log(det(LS))

[Krause et al., 2008]

Page 77: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Relevance and Diversity Measurement

Mutual Information

M(V\S;S) = H(V\S)− H(V\S|S)

= H(V\S) + H(S)− H(V)

Entropy

H(S) =1 + log(2π)

2|S|+ 1

2log(det(LS))

Maximizing

M(S) =12

log(det(LV\S)) +12

log(det(LS))

[Krause et al., 2008]

Page 78: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Relevance and Diversity Measurement

Mutual Information

M(V\S;S) = H(V\S)− H(V\S|S)

= H(V\S) + H(S)− H(V)

Entropy

H(S) =1 + log(2π)

2|S|+ 1

2log(det(LS))

Maximizing

M(S) =12

log(det(LV\S)) +12

log(det(LS))

[Krause et al., 2008]

Page 79: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Relation to Determinantal Point Process

Positive semidefinite kernel matrix L indexed by elements of V

Lij =vT

i‖vi‖

vj

‖vj‖

For every S ∈ V, we define a diversity score

D(S) = log(det(LS))

[Kulesza and Taskar, 2012](Acknowledgement to Jerry :)

Page 80: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Gaze in Video Summarization

fixation fixation saccade saccade saccade fixation fixation

κ 0.91 0.85 0.5 0.89 0.81

thresholdsubshot 1 subshot 2

Better temporal segmentation: egocentric is continuous,but gaze is discrete

Personalization: attention measurement from gazefixations

I(S) =∑i∈S

ci

Page 81: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Gaze in Video Summarization

fixation fixation saccade saccade saccade fixation fixation

κ 0.91 0.85 0.5 0.89 0.81

thresholdsubshot 1 subshot 2

Better temporal segmentation: egocentric is continuous,but gaze is discretePersonalization: attention measurement from gazefixations

I(S) =∑i∈S

ci

Page 82: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Partition Matroid Constraint

MotivationCompactness: cardinality or knapsack constraint?High level supervision: timeline

Partition Matroid ConstructionPartition the video into b disjoint blocks P1,P2, · · · ,Pb

Limit associated with each blockI = A : |A ∩ Pm| ≤ fm,m = 1, 2, · · · , b

[Bilmes, 2013]

Page 83: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Partition Matroid Constraint

MotivationCompactness: cardinality or knapsack constraint?High level supervision: timeline

Partition Matroid ConstructionPartition the video into b disjoint blocks P1,P2, · · · ,Pb

Limit associated with each blockI = A : |A ∩ Pm| ≤ fm,m = 1, 2, · · · , b

[Bilmes, 2013]

Page 84: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Submodular Formulation

maxS

F(S) = M(S) + λI(S)

s.t. S ∈ I

Corollary

F(S) is submodular.

Proposition

Greedy local search achieves a 14 -approximation factor for our

constrained submodular maximization problem.

[Lee et al., 2010][Filmus and Ward, 2012]

Page 85: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Submodular Formulation

maxS

F(S) = M(S) + λI(S)

s.t. S ∈ I

Corollary

F(S) is submodular.

Proposition

Greedy local search achieves a 14 -approximation factor for our

constrained submodular maximization problem.

[Lee et al., 2010][Filmus and Ward, 2012]

Page 86: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Submodular Formulation

maxS

F(S) = M(S) + λI(S)

s.t. S ∈ I

Corollary

F(S) is submodular.

Proposition

Greedy local search achieves a 14 -approximation factor for our

constrained submodular maximization problem.

[Lee et al., 2010][Filmus and Ward, 2012]

Page 87: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Dataset Collection

5 subjects to record their daily lives21 videos with gaze15 hours in total

AnnotationSubjects group subshots into events.

Page 88: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Systematic Evaluation

Evaluation Metric

P =|A ∩ T||A|

, R =|A ∩ T||T|

, F =2PR

P + R

F-measure on GTEA-GAZE+Method uniform kmeans uniform(gaze) kmeans(gaze) oursF-measure 0.161 0.215± 0.016 0.526 0.475± 0.026 0.621

F-measure on Our New DatasetMethod uniform kmeans uniform(gaze) kmeans(gaze) oursF-measure 0.080 0.095± 0.030 0.476 0.509± 0.025 0.585

Page 89: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Systematic Evaluation

Evaluation Metric

P =|A ∩ T||A|

, R =|A ∩ T||T|

, F =2PR

P + R

F-measure on GTEA-GAZE+Method uniform kmeans uniform(gaze) kmeans(gaze) oursF-measure 0.161 0.215± 0.016 0.526 0.475± 0.026 0.621

F-measure on Our New DatasetMethod uniform kmeans uniform(gaze) kmeans(gaze) oursF-measure 0.080 0.095± 0.030 0.476 0.509± 0.025 0.585

Page 90: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Qualitative Result

uniform

k-means

uniform(our subshots)

k-means(our subshots)

ours

Results from GTEA-gaze+ pizza preparation video.

Page 91: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Qualitative Result

uniform

k-means

uniform(our subshots)

k-means(our subshots)

ours

Results from our new dataset: our subject mixes a shake,drinks it, washes his cup, plays chess and texts a friend.

Page 92: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Qualitative Result

uniform

k-means

uniform(our subshots)

k-means(our subshots)

ours

Results from our new dataset: our subject is cooking chickenand have a conversation with his roommate.

Page 93: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Summary

Thesis ContributionAn efficient approach for interactive segmentation whileminimizing human effort (Ch. 2)

A latent graphical model for semantic segmentation usingonly image level tags (Ch. 3)A unified model for semantic segmentation with variousforms of weak supervision (Ch. 4)An online foreground/background video segmentationusing Grassmannian subspace learning (Ch. 5)A submodular summarization framework for first personvideos (Ch. 6)

Page 94: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Summary

Thesis ContributionAn efficient approach for interactive segmentation whileminimizing human effort (Ch. 2)A latent graphical model for semantic segmentation usingonly image level tags (Ch. 3)

A unified model for semantic segmentation with variousforms of weak supervision (Ch. 4)An online foreground/background video segmentationusing Grassmannian subspace learning (Ch. 5)A submodular summarization framework for first personvideos (Ch. 6)

Page 95: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Summary

Thesis ContributionAn efficient approach for interactive segmentation whileminimizing human effort (Ch. 2)A latent graphical model for semantic segmentation usingonly image level tags (Ch. 3)A unified model for semantic segmentation with variousforms of weak supervision (Ch. 4)

An online foreground/background video segmentationusing Grassmannian subspace learning (Ch. 5)A submodular summarization framework for first personvideos (Ch. 6)

Page 96: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Summary

Thesis ContributionAn efficient approach for interactive segmentation whileminimizing human effort (Ch. 2)A latent graphical model for semantic segmentation usingonly image level tags (Ch. 3)A unified model for semantic segmentation with variousforms of weak supervision (Ch. 4)An online foreground/background video segmentationusing Grassmannian subspace learning (Ch. 5)

A submodular summarization framework for first personvideos (Ch. 6)

Page 97: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Summary

Thesis ContributionAn efficient approach for interactive segmentation whileminimizing human effort (Ch. 2)A latent graphical model for semantic segmentation usingonly image level tags (Ch. 3)A unified model for semantic segmentation with variousforms of weak supervision (Ch. 4)An online foreground/background video segmentationusing Grassmannian subspace learning (Ch. 5)A submodular summarization framework for first personvideos (Ch. 6)

Page 98: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Future: Joint Visual and Textual Parsing

y1 y2

x

yC

h1

x1

h2

x2

hN

xN

· · ·

· · ·

· · ·

1

Enhance graphical model with richer prior knowledge:geometry (Hoeim et al., 2007), co-occurrence, etc.Other form of supervisions: Air Quality Index (AQI)Tackle noisy tagsExtend to videos

Page 99: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Future: Egocentric/Robotic Vision

Daily life logging / memory aidPredictive diagnosis for diseaseFirst-person vision for roboticsHelp the blind to sense the visual world

Page 100: Visual Parsing with Weak Supervisionpages.cs.wisc.edu/~jiaxu/pub/defense-slides-jia-xu.pdf · Visual Parsing with Weak Supervision Jia Xu ... Chapter Parsing Task Weak Supervision

Introduction Object Segmentation Scene Parsing Video Parsing Discussion

Acknowledgement

Thesis CommitteeVikas Singh(advisor)Chuck DyerJerry ZhuJude ShavlikMark Craven

FundingUW-Epic RAshipNSF RI 1116584NVIDIAHardware GiftAdobe Gift

CollaboratorsMaxwell Collins (UW-Madison)Chuck Dyer (UW-Madison)Leo Grady (Heartflow)Vamsi Ithapu (UW-Madison)Hyunwoo Kim (UW-Madison)Yin Li (Georgia Tech)Zhe Lin (Adobe Research)Ji Liu (URochester)Lopa Mukherjee (UW-Whitewater)James M. Rehg (Georgia Tech)Alexander Schwing (UToronto)Xiaohui Shen (Adobe Research)Vikas Singh (UW-Madison)Raquel Urtasun (UToronto)Baba Vemuri (UFlorida)Jamieson Warner (UW-Madison)Jerry Zhu (UW-Madison)