Top Banner
Overcoming Ambiguity in Visual Object Recognition Prof. Trevor Darrell UC Berkeley EECS Dept. & UC Berkeley EECS Dept. & Intl. Computer Science Inst. (ICSI)
56

Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

Jun 11, 2018

Download

Documents

duonghanh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

Overcoming Ambiguity in Visual Object Recognition

Prof. Trevor Darrell

UC Berkeley EECS Dept. &UC Berkeley EECS Dept. &

Intl. Computer Science Inst. (ICSI)

Page 2: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

Sources of AmbiguitySources of Ambiguity

• Cue saliency varies across categoriesCue saliency varies across categories– Probabilistic multi‐kernel fusion methods… [Christhoudias]

– Joint regularization across categories… [Quattoni]g g [Q ]

• Individual categories have multiple senses– LearnLearn

– ing dictionary grounded visual models and]

• Multiple surfaces confuse local features

vs vs

Multiple surfaces confuse local features – Local feature models for transparent objects

Page 3: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

Sources of AmbiguitySources of Ambiguity

• Cue saliency varies across categoriesCue saliency varies across categories– Probabilistic multi‐kernel fusion methods… [Christhoudias]

– Joint regularization across categories… [Quattoni]g g [Q ]

• Individual categories have multiple senses– LearnLearn

– ing dictionary grounded visual models and]

• Multiple surfaces confuse local features

vs vs

Multiple surfaces confuse local features – Local feature models for transparent objects

Page 4: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

Sources of AmbiguitySources of Ambiguity

• Cue saliency varies across categoriesCue saliency varies across categories– Probabilistic multi‐kernel fusion… [Christhoudias]

– Joint regularization across categories… [Quattoni]g g [Q ]

• Individual categories have multiple senses– Dictionary grounded visual models [Saenko]Dictionary grounded visual models… [Saenko]

• Multiple surfaces confuse local featuresMultiple surfaces confuse local features – Local feature models for transparent objects [Fritz]

Page 5: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

Today: SnapshotsToday: Snapshots

• Probabilistic multi‐kernel fusionProbabilistic multi kernel fusion

• Joint regularization across categories

l i d l di• Multimodal sense grounding

• Local feature models for transparent objects

Page 6: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

Today: SnapshotsToday: Snapshots

• Probabilistic multi‐kernel fusionProbabilistic multi kernel fusion

• Joint regularization across categories

l i d l di• Multimodal sense grounding

• Local feature models for transparent objects

Page 7: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

Local Representations

Wide variety of proposed local feature representations:

Maximally Stable Extremal

Superpixels [Ren et al.]

Shape context [Belongie et al.]

yRegions [Matas et al.]

SIFT [Lowe]

Geometric Blur [Berg et al.]

Salient regions [Kadir et al.]

Harris-Affine [Schmid et al.]

Spin images [Johnson

and Hebert]

Page 8: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

How to Compare Sets of Features?

• Each instance is unordered set of vectors• Varying number of vectors per instance

??

Page 9: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

Pyramid Match• Optimal matching

for sets with features of dimension

Optimal matching

• Greedy matching

• Pyramid match

for sets with features of dimension

optimal partial matchingmatching

[Grauman and Darrell, ICCV 2005, JMLR 2007]

Page 10: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

Gaussian Process PMK• The P ramid Match defines a Mercer Kernel s itable for SVM and Ga ssian Process based• The Pyramid Match defines a Mercer Kernel suitable for SVM and Gaussian Process based 

regression and classification

b d l f ff l d f• GP‐based classification offers a natural paradigm for Active Learning:

Active Learning Criteria

[Kapoor, Grauman, Urtasun and Darrell, ICCV 2007, IJCV 2009]

Page 11: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal
Page 12: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

[ See http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS‐2009‐96.html for details… ]

Page 13: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal
Page 14: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal
Page 15: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal
Page 16: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal
Page 17: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal
Page 18: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

Today: SnapshotsToday: Snapshots

Probabilistic multi‐kernel fusionProbabilistic multi kernel fusion

• Joint regularization across categories

l i d l di• Multimodal sense grounding

• Local feature models for transparent objects

Page 19: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

Standard “1 vs. all” paradigm….

SVM/GPC – Category 1

SVM/GPC – Category 2

SVM/GPC – Category 3

SVM/GPC – Category 4

SVM/GPC Category 256

SVM/GPC Category 10 000?SVM/GPC – Category 256SVM/GPC – Category 10,000?

How to exploit shared structure?How to exploit shared structure?

Page 20: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

Consider ensemble of classifiersclassifier weights

SVM/GPC – Category 1 w1

classifier weights

SVM/GPC – Category 2 w2

SVM/GPC – Category 3 w3

SVM/GPC – Category 4 w4

SVM/GPC Category 256

SVM/GPC Category wSVM/GPC – Category 256SVM/GPC – Category w10,000

Page 21: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

Consider ensemble of classifiers

SVM/GPC – Category 1 w1

Related tasks and/or object part structure will lead to correlated patterns in W…[Q i C lli D ll CVPR

SVM/GPC – Category 2 w2

[Quattoni, Collins, Darrell, CVPR 2007] explore Ando+Zhangstyle structure learning for scene recognition tasks

SVM/GPC – Category 256

SVM/GPC – Category wn

scene recognition tasks.

Learn W jointly? [Quattoni, Collins, Darrell, CVPR[Quattoni, Collins, Darrell, CVPR 

2008] explore joint spare optimization via matrix norm penalty.

W = [ w1 w2 … wn ]

p y[Quattoni, Carreras, Collins, 

Darrell, ICML 2009] report an efficient learning scheme for [ 1 2 n ] this approach…

Page 22: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

Joint Sparse Approximation

• Consider learning a single sparse linear classifier of the form:

xwxf )(

That is, we want only a few features with non-zero coefficients

• L1 regularization well-known to yield sparse solutions:

Dyx

d

jjwCyxfl

),( 1

||)),((minw

Classificationerror

L1 penalizesnon-sparse solutions

Page 23: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

Joint Sparse Approximation

Optimization over several tasks jointly:

xxf kk w)(

m

kmk

Dyxk

CyxflD

k121

),(,...,, ),....,,R()),((

||1min www

m21 www

Average Losson training set k

penalizes solutions that

utilize too many f tfeatures

Key idea: use a matrix norm…[Obozinski et al. 2006, Argyriou et al. 2006, Amit et al. 2007 ]

Page 24: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

Joint Regularization Penalty

How do we penalize solutions that use too many features?

m

WWWWWW ,12,11,1

Coefficients for

m

WWW

WWW ,22,21,2

for feature 2

mddd WWW ,2,1, Coefficients for

classifier 2

rowszerononW #)R(classifier 2

Would lead to a hard combinatorial problemWould lead to a hard combinatorial problem .

Page 25: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

Joint Regularization Penalty

We use a L1-∞ norm [Tropp 2006]

d

WW |)(|)R(

i

ikkWW

1|)(|max)R(

This norm combines: The L∞ norm on each row promotes non-sparsity on the rows. Share

featuresAn L1 norm on the maximum absolute

values of the coefficients across tasks promotes sparsity.

Use few features

The combination of the two norms results in a solution where only a few features are used but the features usedwhere only a few features are used but the features used will contribute in solving many classification problems.

Page 26: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

Joint Sparse Approximation

Using the L1-∞ norm we can rewrite our objective function as:

m

k

d

iikkk

Dyxk

WCyxflD

k1 1),(|)(|max)),((

||1minW

For any convex loss this is a convex objective.

For the hinge loss the optimization problem can be expressedFor the hinge loss the optimization problem can be expressed as a linear program. [Quattoni et al. CVPR 2008]

See also [Quattoni et al ICML 2009] for efficient large scale solutions.

Page 27: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

News Image Classification Experiments0.46

Reuters Dataset Results

0.4

0.42

0.44

EE

R

SuperBowlDanish

CartoonsSharonAustralian

openTrapped

coal miners

0.34

0.36

0.38Mea

n E

L2L1

Goldenglobes Grammys Figure

skating AcademyAwards Iraq

15 30 60 120 2400.32

# training examples per task

L1-INF

Absolute Weights L1

60

Absolute Weights L1-INF

0 06

atur

e

500

1000

1500

40

50

60

atur

e

500

1000

1500

0.04

0.05

0.06

Fea

2000

2500 10

20

30 Fea

2000

25000.01

0.02

0.03

5 10 15 20 25 30 35 40

3000

task

5 10 15 20 25 30 35 40

3000

L1,∞L1

Page 28: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

Today: SnapshotsToday: Snapshots

Probabilistic multi‐kernel fusionProbabilistic multi kernel fusion

Joint regularization across categories

l i d l di• Multimodal sense grounding

• Local feature models for transparent objects

Page 29: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

Goal: Object recognition in situated environmentsenvironments

• Imagine using natural dialogue to instantiate g g gobject models in a robot

Th t’ tThis is one of my There’s aThat’s a cat over

there…

ypurses.

There s a lamp…

29

Page 30: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

Speech, image can be complementary…

a pan...ant → fan

That’s a pen!

ant → fanface → basspiano → cannon

Copy machine.

30

.

Page 31: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

Towards very large object vocabularies…

Learn visual models on the flyfor N-best audio

31

candidates…

Page 32: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

Training images from online image search…

32

Page 33: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

Problem: visual polysemy

33

Page 34: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

Sources of visual polysemy

Hurricane, tornado watch Celebrity watch

Watch out!

34

Would rather watch… Suicide watch

Page 35: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

Take advantage of text contexts

icrystal rfid wrist watch features watch masterpiece innovative watch making craftsmanship absolute precision fine charm high scratch resistance anti-

allergenic characteristics make chronometer true jewel s wrist water proof sleek stylish wrist watch solar powered available watch ticket key purse identity card special offer place order rfid wrist watch absolutely free rfid watch black

wrist strap rfid watch orange wrist strapwrist strap rfid watch orange wrist strap rfid watch stainless steel privacy

disclaimer copyright icrystal pty website

35

Page 36: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

Latent Topics

icrystal rfid wrist watch features watchmasterpiece innovative watch making craftsmanship absolute precision fine

charm high scratch resistance anti-allergenic characteristics make

chronometer true jewel wrist waterproof sleek stylish wrist watch solar powered available watch ticket key

purse identity card special offer place order rfid wrist watch absolutely free rfid watch black wrist strap rfid watchrfid watch black wrist strap rfid watch

orange wrist strap rfid watch stainlesssteel privacy disclaimer copyright

icrystal pty website

36

Page 37: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

Overview of approaches to web-based object model learningmodel learning

• Some learn only from image features– (Li et al 07) bootstrap from labeled images– (Li et al.07) bootstrap from labeled images– (Fergus et al.05) select correct image topic

Some incorporate text features• Some incorporate text features– (Schroff et al.07) use a category-independent text classifier– (Berg and Forsyth 06) ask user to sort text topics

• None address polysemy directly– (Loeff et al.06) do image sense discrimination, not

identification

• All rely on labeled images of correct sense

37

y g

Page 38: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

WISDOM: Using dictionary entries to ground sensessenses

• Use entry text to learn a probability distribution over words for that sense

• Problem: entries contain very little text– Expand by adding synonyms, example sentences, etc.– Still, very few words are covered!

•S: (n) mouse (any of numerous small rodents typically resembling diminutive rats having pointed snouts and small ears on elongated bodies with slender usually hairless tails) •direct hyponym / full hyponym

•S: (n) house mouse, Mus musculus (brownish-grey Old World mouse now a common household pest worldwide) •S: (n) harvest mouse, Micromyx minutus (small reddish-brown Eurasian mouse inhabiting e.g. cornfields) •S: (n) field mouse, fieldmouse (any nocturnal Old World mouse of the genus Apodemus inhabiting woods and fields and gardens) •S: (n) nude mouse (a mouse with a genetic defect that prevents them from growing hair and also prevents•S: (n) nude mouse (a mouse with a genetic defect that prevents them from growing hair and also prevents them from immunologically rejecting human cells and tissues; widely used in preclinical trials) •S: (n) wood mouse (any of various New World woodland mice)

•direct hypernym / inherited hypernym / sister term•S: (n) rodent, gnawer (relatively small placental mammals having a single pair of constantly growing incisor

38

teeth specialized for gnawing)

Page 39: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

WISDOM: Probabilistic dictionary-based modelmodel

• Main idea: S h E i W t h

unlabeled text

– Using LDA, learn latent sense-like dimensions on large amount of related text, object, calling handler(prop,

oldval, newval) whenever prop is setin .developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Global_Objects/Object/watch - 30k -Cached - Similar pages - Note this

Search Engine WatchSearch Engine Watch is the authoritative guide to search engine marketing (SEM) and search engine optimization (SEO), offering the latest news about search ...searchenginewatch.com/ - 38k -Cached - Similar pages - Note thiswatch - MDCWatches for assignment to a property named prop in this object, calling handler(prop, oldval, newval) whenever prop is set and storing the return value in ...developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Global_Objects/Object/watch - 30k -Cached - Similar pages - Note this

LDA

– Model dictionary senses in LDA space:

• Map image contexts to topics• Map topics to senses

thisthis

Map topics to senses

39

Page 40: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

Web Image Sense DictiOnary Model

WISDOM does: noun

Search Engine WatchSearch Engine Watch is the authoritative guide to search engine marketing (SEM) and search engine optimization (SEO), offering the latest news about search ...searchenginewatch.com/ - 38k -Cached - Similar pages - Note this

dictionary definitions

unlabeled text1. image sense disambiguation

2. dataset collection

web images

object, calling handler(prop, oldval, newval) whenever prop is setin .developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Global_Objects/Object/watch - 30k -Cached - Similar pages - Note this

thiswatch - MDCWatches for assignment to a property named prop in this object, calling handler(prop, oldval, newval) whenever prop is set and storing the return value in ...developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Global_Objects/Object/watch - 30k -Cached - Similar pages - Note this

3. classification of unseen images

t i i i

dictionary model P( sense | data)

Sense-specific l ifi

training imagesfosil wrist watch a800 x 628 - 107k - jpg

amgmedia.com

t h 1(ti k )

40

classifierwatch-1(ticker)

Page 41: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

WISDOM classifier

noun

Search Engine WatchSearch Engine Watch is the authoritative guide to search engine marketing (SEM) and search engine optimization (SEO), offering the latest news about search ...searchenginewatch.com/ - 38k -Cached - Similar pages - Note hi

dictionary definitions

unlabeled text web images

object, calling handler(prop, oldval, newval) whenever prop is setin .developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Global_Objects/Object/watch - 30k -Cached - Similar pages - Note this

thiswatch - MDCWatches for assignment to a property named prop in this object, calling handler(prop, oldval, newval) whenever prop is set and storing the return value in ...developer.mozilla.org/en/Core_JavaScript_1.5_Reference/Global_Objects/Object/watch - 30k -Cached - Similar pages - Note this

t i i i

dictionary model P( sense | data)

SVM classifier

training imagesfosil wrist watch a800 x 628 - 107k - jpg

amgmedia.com

t h 1(ti k )

41

classifierwatch-1(ticker)

Page 42: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

Evaluation datasets

core relatedcore relatedunrelated ???

• Collected by querying Image Search – MIT-ISD: bass, face, mouse, speaker, watchMIT ISD: bass, face, mouse, speaker, watch– MIT-OFFICE: cellphone, fork, hammer, keyboard, mug, pliers,

scissors, stapler, telephone, watch– UIUC-ISD: bass, crane, squash

42

Page 43: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

Experimental Setup

• Task: Image sense disambiguation (ISD) in search results– Separate images according to visual senseSeparate images according to visual sense– “core” labels are positive class, “related” and “unrelated” negative– Metrics: true positives vs. false positives (ROC), recall-precision

curve (RPC)( )

• Task: object classification in a novel image– Classify image as having correct object category or notC ass y age as a g co ect object catego y o ot– “core” labels are positive class, other keyword’s “core” senses are

negative class

43

Page 44: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

ISD Results: ROC using each WordNet sense for BASS

yahooyahoo

sense for BASSBASS

y

musical range

polyph. rangemale singer

y

musical range

polyph. rangemale singer

sea bass

freshwater bass

basso, voice

sea bass

freshwater bass

basso, voiceositi

ve ra

te

instrument

spiny fishinstrument

spiny fishTrue

po

44False positive rate

Page 45: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

Today: SnapshotsToday: Snapshots

Probabilistic multi‐kernel fusionProbabilistic multi kernel fusion

Joint regularization across categories

l i d l diMultimodal sense grounding

• Local feature models for transparent objects

Page 46: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal
Page 47: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

MotivationMotivation

• Transparent objectsTransparent objects made out of glass or plastic are ubiquitous in domestic environments

• Traditional local feature approach inappropriate

• Full physical model blintractable

Page 48: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

Local Additive Feature ModelLocal Additive Feature Model

• Significant variation in patch appearance

• ... but common latent structure

Page 49: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

new LDA‐SIFT modelnew LDA SIFT model

Page 50: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

LDA‐SIFTLDA SIFT

Page 51: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

Transparent Visual WordsTransparent Visual Words

• For each patch we infer the latent mixture activations that characterize the additive structureactivations that characterize the additive structure

• We model the glass by learning a spatial layout of discrete “transparent local feature” activationsdiscrete  transparent local feature  activations

Page 52: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

Training DataTraining Data

Page 53: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

Example Results

• Training on 4 different glasses in front of screenglasses in front of screen

• Testing on 49 glass instances in home environment

• Sliding window linear SVM‐BOW detection

Page 54: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

Overcoming AmbiguityOvercoming Ambiguity

• Cue saliency varies across categoriesCue saliency varies across categories– Probabilistic multi‐kernel fusion methods… [Christhoudias]

– Joint regularization across categories… [Quattoni]g g [Q ]

• Individual categories have multiple senses– LearnLearn

– ing dictionary grounded visual models and]

• Multiple surfaces confuse local features

vs vs

Multiple surfaces confuse local features – Local feature models for transparent objects

Page 55: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

For more information…For more information…• Probabilistic multi‐kernel fusion

– Christhoudias, Urtasun, Darrell, CVPR 2009

• Joint regularization across categoriesg g– Quattoni,  Carreras, Collins, Darrell, ICML 2009.

• Multimodal sense groundingMultimodal sense grounding– Saenko and Darrell, NIPS 2008

L l f t d l f t t bj t• Local feature models for transparent objects– Fritz, Bradski, Black, and Darrell, in review…

Page 56: Overcoming Ambiguity in Visual Object Recognitiontrevor/bavm.pdf · Overcoming Ambiguity in Visual Object Recognition ... – Local feature models for transparent objects. ... optimal

New ICSI/UCB Vision Group

Research Scientist Graduate Students

Prof. Trevor Darrell

Research Scientist

Raquel Urtasun (  TTI‐C)

Postdocs

Graduate Students

Ashley Eden

Al ShPostdocs

Mario Fritz

B i K li

Alex Shyr

Trevor Owens

D G ll dBrian Kulis

Mathieu Salzmann

h h d

Dave Golland

Carl Ek (Visiting)Mario Christhoudias

Kate Saenko (Boston)Sergey Karayev (’09/’10)

Next BAVM:  Berkeley, late Jan. 2010…..date conflicts?