Top Banner
Object Recognition as Machine Translation: Learning a Lexicon for a Fixed image Vocabulary Pinar Duygulu Middle East Technical University, Turkey Joint work with Kobus Barnard, Nando de Freitas and David Forsyth as a part of UC Berkeley Digital Library Project
71

Object Recognition as Machine Translation: Learning a ...

Mar 16, 2023

Download

Documents

Khang Minh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Object Recognition as Machine Translation: Learning a ...

Object Recognition as Machine Translation: Learning a Lexicon for a Fixed image

Vocabulary

Pinar DuyguluMiddle East Technical University, Turkey

Joint work with Kobus Barnard, Nando de Freitas and David Forsyth

as a part ofUC Berkeley Digital Library Project

Page 2: Object Recognition as Machine Translation: Learning a ...

•How to model?

Problems in Object Recognition

•Scale

•What is an object ?

Page 3: Object Recognition as Machine Translation: Learning a ...

Our Approach

Object recognition on a large scale is linking words with image regions

tiger

grass

grass

grass

tiger

tiger grass cat

Use joint probability of words and pictures in largedatasets

Page 4: Object Recognition as Machine Translation: Learning a ...

Auto-Annotating Images

tiger grass cat

Other related work : Maron 98, Mori 99

Barnard, Forsyth (ICCV 2001) , Barnard, Duygulu, Forsyth (CVPR 2001)

Finding words for the images

Page 5: Object Recognition as Machine Translation: Learning a ...

Annotation vs Recognition

tiger cat grass?

Cannot be solved with one example

Page 6: Object Recognition as Machine Translation: Learning a ...

Statistical Machine Translation

Data: Aligned sentences, but wordcorrespondences are unknown

“the beautiful sun”

“le soleil beau”

Brown, Della Pietra, Della Pietra & Mercer 93

Page 7: Object Recognition as Machine Translation: Learning a ...

Statistical Machine Translation

Given the correspondences, we canestimate the translation p(sun|soleil)

Given the probabilities, we can estimate the correspondences

Page 8: Object Recognition as Machine Translation: Learning a ...

Statistical Machine Translation

Enough data + EM, we canobtain the translation p(sun|soleil)=1

“the beautiful sun”

“le soleil beau”

Page 9: Object Recognition as Machine Translation: Learning a ...

Multimedia Translation

“sun sea sky”

Page 10: Object Recognition as Machine Translation: Learning a ...

392 CD’s, each consisting of 100 annotated images.

Corel Database

Page 11: Object Recognition as Machine Translation: Learning a ...

Input

sun sky waves sea Each blob is a large vector of features

segmentation*

* Thanks to Blobworld team [Carson, Belongie, Greenspan, Malik], N-cuts team [Shi, Tal, Malik]

• Region size• Position• Color• Oriented energy (12 filters)• Simple shape features

Page 12: Object Recognition as Machine Translation: Learning a ...

Tokenization

- Words � word tokens

- Image segments

•represented by 30 features(size, position, color, texture and shape)

•k-means to cluster features

•best cluster for the blob � blob tokens

Page 13: Object Recognition as Machine Translation: Learning a ...

Data160 CD’s from Corel Data Set100 images in each

10 setseach :

randomly selected 80 CD’s~6000 training~2000 test150-200 word tokens500 blob tokens

Segmentation (using Ncuts)about a month

Page 14: Object Recognition as Machine Translation: Learning a ...

city mountain sky sun jet plane sky

jet plane sky

cat forest grass tiger

cat grass tiger waterbeach people sun water

Page 15: Object Recognition as Machine Translation: Learning a ...

Assignments

“sun sea sky”

p(a1=1)

p(a1=2) p(a1=3)

p(a1=4)

Bn

Σ p(a1 = i) = 1i=1

Page 16: Object Recognition as Machine Translation: Learning a ...

“sun sea sky”

p(a2=1)

p(a2=2) p(a2=3)

p(a2=4)

Bn

Σ p(a2 = i) = 1i=1

Assignments

Page 17: Object Recognition as Machine Translation: Learning a ...

Assignments

“sun sea sky”

p(a3=1)

p(a3=2) p(a3=3)

p(a3=4)

Bn

Σ p(a3 = i) = 1i=1

Page 18: Object Recognition as Machine Translation: Learning a ...

Initialization

Initialize translation table to blob-word cooccurences(emprical joint distribution of blobs and words)

.. ..

sun sea

Page 19: Object Recognition as Machine Translation: Learning a ...

Using Expectation Maximization

Given the translation probabilities estimate the correspondences

Given the correspondences estimate the translation probabilities

Dempster et al., 77

N Mn Ln

p(w|b) = � � � p(anj = i) t(w = wnj, b = bni)n=1 j=1 i=1

Page 20: Object Recognition as Machine Translation: Learning a ...

EM algorithmE step :

(for one pair)

b1 b3 b4

w1 w5

b2 b1 b5

w1 w2 w4

. . .

b1 b2

w1 w2 w6

. ...

w1b1

w2

b2

Predicting correspondences from translation probabilities

translation probabilities correspondences

Page 21: Object Recognition as Machine Translation: Learning a ...

EM algorithmM step :

(for one pair)Predicting translation probabilities from correspondences

. ...

w1b1

w2

b2

translation probabilities

b1 b3 b4

w1 w5

. . .

b1 b2

w1 w2 w6

correspondences

b2 b1 b5

w1 w2 w4

Page 22: Object Recognition as Machine Translation: Learning a ...

Dictionary

sun

sky

cat

horse

Page 23: Object Recognition as Machine Translation: Learning a ...

Labeling Regions

On a new image

• Find the blob token

•Look at the word posterior given the blob

•For each region

•Segment the image

Page 24: Object Recognition as Machine Translation: Learning a ...

Labeling Regions

tiger

cat

hors

egras

s

sun

fore

st

tiger

cat

hors

e

gras

s

sun

fore

st

Page 25: Object Recognition as Machine Translation: Learning a ...

Labeling Regions

tiger

cat

hors

egras

s

sun

fore

st

Display only maximal probable word

tiger

Page 26: Object Recognition as Machine Translation: Learning a ...
Page 27: Object Recognition as Machine Translation: Learning a ...
Page 28: Object Recognition as Machine Translation: Learning a ...

Measuring Performance

First strategy--score by hand

Second strategy--use annotation performance as a proxy.

Page 29: Object Recognition as Machine Translation: Learning a ...

First Strategy:Score by hand

Average performance is four times better than guessing the most common word

(“water”)

Page 30: Object Recognition as Machine Translation: Learning a ...

Second Strategy: Use Annotation

tiger cat grass water

Automatic : Don’t need to do by hand

Page 31: Object Recognition as Machine Translation: Learning a ...

Annotating Images

. . .

Page 32: Object Recognition as Machine Translation: Learning a ...

GRASS TIGER CAT FOREST

Predicted Words

Actual Keywords

CAT HORSE GRASS WATER

Measuring Annotation Performance

Page 33: Object Recognition as Machine Translation: Learning a ...

GRASS TIGER CAT FOREST

Predicted Words

Actual Keywords

Measuring Annotation Performance

CAT HORSE GRASS WATER

Page 34: Object Recognition as Machine Translation: Learning a ...
Page 35: Object Recognition as Machine Translation: Learning a ...
Page 36: Object Recognition as Machine Translation: Learning a ...
Page 37: Object Recognition as Machine Translation: Learning a ...

Improving the System

•Refusing to predict

•Merging indistinguishable words

Page 38: Object Recognition as Machine Translation: Learning a ...

Refusing to predict

if p(word | blob) > threshold

predict a wordotherwise

assign null

Null and fertility problemssimple solution to null - refusing to predict

Page 39: Object Recognition as Machine Translation: Learning a ...

Examples (null threshold = 0.2)

Page 40: Object Recognition as Machine Translation: Learning a ...

Recall and Precision(for null threshold from 0 to 0.5)

selected good words selected bad words

Page 41: Object Recognition as Machine Translation: Learning a ...

Clustering Indistinguishable Words

merge words which can’t be told apart

e.g. locomotive vs. train

Page 42: Object Recognition as Machine Translation: Learning a ...

Examples

Page 43: Object Recognition as Machine Translation: Learning a ...

Applying Performance Measurement

•Feature Selection

•Segmentation Comparison

•Model Selection

Page 44: Object Recognition as Machine Translation: Learning a ...

Feature Selection

Propose good features to differentiate words that are not distinguishable (e.g., eagle and jet)

Page 45: Object Recognition as Machine Translation: Learning a ...

Blobworld segmentations

N-cuts segmentations

Segmentation Comparison

Page 46: Object Recognition as Machine Translation: Learning a ...

0

0.2

0.4

0.6

0.8

2 4 6 8 10 12 14 16 18

A comparison of two segmentation algorithmsusing word prediction performance

Number of segment used for word prediction

Ncuts, training

Blobworld, training

N-cuts, held out

Blobworld, held out

N-cuts, novel CD's

Blobworld, novel CD's

KL divergencebased word prediction measure (compared with prior, bigger is better)

Page 47: Object Recognition as Machine Translation: Learning a ...

• Clustering models• Aspect models• Hierarchical models• Bayesian models• Co-occurrence models

Many of these based on models proposed for text [ Brown, Della Pietra, Della Pietra & Mercer 93; Hofmann 98; Hofmann & Puzicha 98 ]

A comparison paper is submitted to JMLR‘Matching words and Pictures’, Barnard, Duygulu, Forsyth, Freitas, Blei, Jordan

Model for joint probability of text and blobs

Model Selection

Page 48: Object Recognition as Machine Translation: Learning a ...

Discussion

Recognition on the large scale

Unsupervised - using the available data efficiently

Learn what to recognize

Page 49: Object Recognition as Machine Translation: Learning a ...

Future Directions

Estimate where a minimal amount of supervision can be most helpful (and provide it)

Page 50: Object Recognition as Machine Translation: Learning a ...

Using labelled data

500 hand labeled images Modified to be added to each of 10 sets

very hard !!!-takes a lot of time

-large vocabulary

-cheetah, leopard or cat

Page 51: Object Recognition as Machine Translation: Learning a ...

Using labelled data

Page 52: Object Recognition as Machine Translation: Learning a ...

Using labelled datause them to supervise

-add to data

-fix correspondences -retrain

“sun sea sky”

Page 53: Object Recognition as Machine Translation: Learning a ...

Future Directions

Propose region merging based on posterior word probabilities

Propose merging

Page 54: Object Recognition as Machine Translation: Learning a ...

Preliminary Results

elephant plane cat

Page 55: Object Recognition as Machine Translation: Learning a ...

Corel Image Data 40,000 images

Fine Arts Museum of San Francisco 83,000 images online

Cal-flora 20,000 images, species information

News photos with captions (yahoo.com)

1,500 images per day available from yahoo.com

Hulton Archive 40,000,000 images (only 230,000 online)

internet.archive.org 1,000 movies with no copyright

TV news archives (televisionarchive.org, informedia.cs.cmu.edu)

Several terabytes already available

Google Image Crawl >330,000,000 images (with nearby text)

Satellite images (terrarserver.com, nasa.gov, usgs.gov)

(And associated demographic information)

Medial images (And associated with clinical information)

Future Directions(other data)

Page 56: Object Recognition as Machine Translation: Learning a ...

FAMSF Data (83,000 images online)

Page 57: Object Recognition as Machine Translation: Learning a ...

Natural Language Processing

• Parts of speech* (prefer nouns for now)

• Sense Disambiguation

• Expand semantics using WordNet

* We use Eric Brill’s parts of speech tagger (available on-line)

WordNet is an on-line lexical reference system from Princeton (Miller et.al)†

Page 58: Object Recognition as Machine Translation: Learning a ...

Multiple Senses

212001 bank buildings trees city

125090 bank machine money currency bills 125084 piggy bank coins currency money26078 water grass trees banks

173044 mink rodent bank grass 151096 snow banks hills winter

Page 59: Object Recognition as Machine Translation: Learning a ...

News data

News photos with captions(1500 images per day available from yahoo.com)

learn topic structure using both images and text

different pictures for the same topic

different stories that use the same picture

Page 60: Object Recognition as Machine Translation: Learning a ...

Other Applications

• Auto Annotation

• Auto Illustration

• Organizing Image Collections for Browsing

Page 61: Object Recognition as Machine Translation: Learning a ...

KeywordsGRASS TIGER CAT FOREST

Predicted Words (rank order)

KeywordsHIPPO BULL mouth walk

Predicted Words (rank order)

KeywordsFLOWER coralberry LEAVES PLANT

tiger cat grass people water bengal buildings ocean forest reef

water hippos rhino river grass reflection one-horned head plain sand

fish reef church wall people water landscape coral sand trees

Predicted Words (rank order)

Words from Pictures (Auto-annotation)

Page 62: Object Recognition as Machine Translation: Learning a ...

Pictures from Words (Auto-illustration)

Text Passage (Moby Dick)

“The large importance attached to the harpooneer's vocation is evinced by the fact, that originally in the old Dutch Fishery, two centuries and more ago, the command of a whale-ship …“

Extracted Query

Retrieved Images

large importance attached fact old dutch century more command whale ship was person was divided officer word means fat cutter time made days was general vessel whale hunting concern british title old dutch ...

Page 63: Object Recognition as Machine Translation: Learning a ...
Page 64: Object Recognition as Machine Translation: Learning a ...

Organizing Image Collections

Page 65: Object Recognition as Machine Translation: Learning a ...

sunwavesskysea

[ Hofmann 98; Hofmann & Puzicha 98 ]

emit more generalwords and blobs

(e.g. sky)

emit more specific words and blobs(e.g. waves)

Hierarchical model

Page 66: Object Recognition as Machine Translation: Learning a ...
Page 67: Object Recognition as Machine Translation: Learning a ...
Page 68: Object Recognition as Machine Translation: Learning a ...

Browsing

Browsing gives users an overall understanding of what is in a collection--a prerequisite for effective searching.

Need to organize images in a way that is relevant to humans

related studies---Sclaroff, Taycher, and La Cascia, 98; Rubner, Tomasi, and Guibas, 00; Smith Kanade, 97.

Page 69: Object Recognition as Machine Translation: Learning a ...
Page 70: Object Recognition as Machine Translation: Learning a ...
Page 71: Object Recognition as Machine Translation: Learning a ...

The End