Top Banner
Every Picture Tells a Story: Generating Sentences from Images by Ali Farhadi, Mohsen Hejrati, Mohammad Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, and David Forsyth PRESENTATION BY KERRY SEITZ 1
27

Every Picture Tells a Story: Generating Sentences from Images · Every Picture Tells a Story: Generating Sentences from Images ... Every Picture Tells a Story: Generating Sentences

Nov 23, 2018

Download

Documents

lequynh
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Every Picture Tells a Story: Generating Sentences from Images · Every Picture Tells a Story: Generating Sentences from Images ... Every Picture Tells a Story: Generating Sentences

Every Picture Tells a Story:Generating Sentences from Images

by Ali Farhadi, Mohsen Hejrati, Mohammad Sadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, and David Forsyth

PRESENTATION BY KERRY SEITZ

1

Page 2: Every Picture Tells a Story: Generating Sentences from Images · Every Picture Tells a Story: Generating Sentences from Images ... Every Picture Tells a Story: Generating Sentences

The Problem

Generate sentences from images

Short, descriptive

A man is getting out of the red convertible.

A women in green is talking

on a phone.

2[LAPTEV ET AL. 2008]

Page 3: Every Picture Tells a Story: Generating Sentences from Images · Every Picture Tells a Story: Generating Sentences from Images ... Every Picture Tells a Story: Generating Sentences

Challenges

Lack of Dataset

Out-of-vocabulary words

Phrasing variations◦ “Will gets out of the Chevrolet.”◦ “A black car pulls up. Two army officers get out.”◦ “Erin exits her new truck.”

3[LAPTEV ET AL. 2008]

Page 4: Every Picture Tells a Story: Generating Sentences from Images · Every Picture Tells a Story: Generating Sentences from Images ... Every Picture Tells a Story: Generating Sentences

Challenges

Synecdoche◦ “Will you watch my animal this weekend?”

4

Page 5: Every Picture Tells a Story: Generating Sentences from Images · Every Picture Tells a Story: Generating Sentences from Images ... Every Picture Tells a Story: Generating Sentences

Challenges

Synecdoche◦ “Will you watch my animal this weekend?”◦ “I just got a hot new set of wheels.”

5

Page 6: Every Picture Tells a Story: Generating Sentences from Images · Every Picture Tells a Story: Generating Sentences from Images ... Every Picture Tells a Story: Generating Sentences

Dataset

Based on PASCAL 2008 images

Randomly select 50 images from each category1000 images total

Generate 5 captions for each image◦ Using Amazon’s Mechanical Turk

Manually add triples: <object, action, scene>

6

Page 7: Every Picture Tells a Story: Generating Sentences from Images · Every Picture Tells a Story: Generating Sentences from Images ... Every Picture Tells a Story: Generating Sentences

Approach – Meanings space

7[FARHADI ET AL. 2010]

Page 8: Every Picture Tells a Story: Generating Sentences from Images · Every Picture Tells a Story: Generating Sentences from Images ... Every Picture Tells a Story: Generating Sentences

Mapping Images to Meaning

Discrete set of values for each label

Solve small multi-label Markov random field

Use greedy method to do inference◦ Linear combination of feature functions◦ Train to score highest on ground truth triple

8

Page 9: Every Picture Tells a Story: Generating Sentences from Images · Every Picture Tells a Story: Generating Sentences from Images ... Every Picture Tells a Story: Generating Sentences

Mapping Images to Meaning

9[FARHADI ET AL. 2010]

Page 10: Every Picture Tells a Story: Generating Sentences from Images · Every Picture Tells a Story: Generating Sentences from Images ... Every Picture Tells a Story: Generating Sentences

Image Features

Deformable Parts Model◦ Get prediction for each class◦ Consider max confidence of detectors, bounding box center, bounding box aspect

ratio, and scale

Hoiem et al. classification◦ Based on geometry, HOG features, and detection responses

Gist-based scene classification◦ Global information◦ Adaboost style classifiers

10

Page 11: Every Picture Tells a Story: Generating Sentences from Images · Every Picture Tells a Story: Generating Sentences from Images ... Every Picture Tells a Story: Generating Sentences

Node Potentials

For test image, get kNN in training set◦ By matching image features◦ By deriving from classifiers and detectors

Compute average node features over neighbors◦ Computed from image side◦ Computed from sentence side

11

Page 12: Every Picture Tells a Story: Generating Sentences from Images · Every Picture Tells a Story: Generating Sentences from Images ... Every Picture Tells a Story: Generating Sentences

Edge Potentials

Find edge weights such that ground truth triples score highest

Linear combination of four estimates (from node A to node B):◦ Normalized frequency of word A in corpus, f(A)

◦ Normalized frequency of word B in corpus, f(B)

◦ Normalized frequency of (A and B) at the same time, f(A, B)

◦ 𝑓𝑓(𝐴𝐴,𝐵𝐵)𝑓𝑓 𝐴𝐴 𝑓𝑓(𝐵𝐵)

12

Page 13: Every Picture Tells a Story: Generating Sentences from Images · Every Picture Tells a Story: Generating Sentences from Images ... Every Picture Tells a Story: Generating Sentences

Sentence Potentials

Use parser to extract◦ Subject and direct object (object, action)◦ Head nouns of prepositional phrases (scene)◦ Head noun of phrase “X in the background” (scene)

13

Page 14: Every Picture Tells a Story: Generating Sentences from Images · Every Picture Tells a Story: Generating Sentences from Images ... Every Picture Tells a Story: Generating Sentences

Matching Triplets Between an Image and a Sentence

Matching score approximation◦ Top k ranking triples from sentences, compute rank of each as image triple◦ Top k ranking triples from images, compute rank of each as sentence triple◦ Sum the sum of ranks, weighted by inverse rank, to emphasize stronger

triples

14

Page 15: Every Picture Tells a Story: Generating Sentences from Images · Every Picture Tells a Story: Generating Sentences from Images ... Every Picture Tells a Story: Generating Sentences

Evaluation of Mappings to Meaning Space

Compare all triple elements

If ground truth is (dog, sit, ground), which is better:◦ (cat, sit, mat) or (bike, ride, street)?◦ (cat, sit, mat) or (object, do, scene)?

15

Page 16: Every Picture Tells a Story: Generating Sentences from Images · Every Picture Tells a Story: Generating Sentences from Images ... Every Picture Tells a Story: Generating Sentences

Tree-F1 Measure

Object

Animal

Cat Dog

Vehicle

Bicycle Car

𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 =# 𝑃𝑃𝑒𝑒𝑒𝑒𝑃𝑃𝑃𝑃 𝑚𝑚𝑚𝑚𝑚𝑚𝑃𝑃𝑚𝑃𝑃𝑃𝑃𝑒𝑒 𝑒𝑒𝑃𝑃𝑃𝑃𝑔𝑔𝑃𝑃𝑒𝑒 𝑚𝑚𝑃𝑃𝑔𝑔𝑚𝑚𝑚

# 𝑃𝑃𝑒𝑒𝑒𝑒𝑃𝑃𝑃𝑃 𝑃𝑃𝑃𝑃 𝑒𝑒𝑃𝑃𝑃𝑃𝑔𝑔𝑃𝑃𝑒𝑒 𝑚𝑚𝑃𝑃𝑔𝑔𝑚𝑚𝑚 𝑝𝑝𝑚𝑚𝑚𝑚𝑚

𝑅𝑅𝑃𝑃𝑃𝑃𝑚𝑚𝑅𝑅𝑅𝑅 =# 𝑃𝑃𝑒𝑒𝑒𝑒𝑃𝑃𝑃𝑃 𝑚𝑚𝑚𝑚𝑚𝑚𝑃𝑃𝑚𝑃𝑃𝑃𝑃𝑒𝑒 𝑒𝑒𝑃𝑃𝑃𝑃𝑔𝑔𝑃𝑃𝑒𝑒 𝑚𝑚𝑃𝑃𝑔𝑔𝑚𝑚𝑚

# 𝑃𝑃𝑒𝑒𝑒𝑒𝑃𝑃𝑃𝑃 𝑃𝑃𝑃𝑃 𝑝𝑝𝑃𝑃𝑃𝑃𝑒𝑒𝑃𝑃𝑃𝑃𝑚𝑚𝑃𝑃𝑒𝑒 𝑝𝑝𝑚𝑚𝑚𝑚𝑚

𝐹𝐹𝐹 = 2𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 × 𝑅𝑅𝑃𝑃𝑃𝑃𝑚𝑚𝑅𝑅𝑅𝑅𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃𝑃 + 𝑅𝑅𝑃𝑃𝑃𝑃𝑚𝑚𝑅𝑅𝑅𝑅

16

Page 17: Every Picture Tells a Story: Generating Sentences from Images · Every Picture Tells a Story: Generating Sentences from Images ... Every Picture Tells a Story: Generating Sentences

BLUE Measure

Check to see if triple is valid◦ E.g. (bottle, walk, street) not valid

If triple ever appeared in corpus, then it is valid

17

Page 18: Every Picture Tells a Story: Generating Sentences from Images · Every Picture Tells a Story: Generating Sentences from Images ... Every Picture Tells a Story: Generating Sentences

Results – Images to Meaning

18[FARHADI ET AL. 2010]

Page 19: Every Picture Tells a Story: Generating Sentences from Images · Every Picture Tells a Story: Generating Sentences from Images ... Every Picture Tells a Story: Generating Sentences

Results – Generating Sentences

19[FARHADI ET AL. 2010]

Page 20: Every Picture Tells a Story: Generating Sentences from Images · Every Picture Tells a Story: Generating Sentences from Images ... Every Picture Tells a Story: Generating Sentences

Results – Generating SentencesTrained annotators to evaluate sentences◦ 1 – sentence is accurate◦ 2 – sentence has rough idea about image◦ 3 – sentence is not even close

Generated 10 sentences per image

Averages◦ Total average: 2.33◦ # sentences with score one per image: 1.48◦ # sentences with score two per image: 3.8

20

Page 21: Every Picture Tells a Story: Generating Sentences from Images · Every Picture Tells a Story: Generating Sentences from Images ... Every Picture Tells a Story: Generating Sentences

Results – Finding Images for Sentences

21[FARHADI ET AL. 2010]

Page 22: Every Picture Tells a Story: Generating Sentences from Images · Every Picture Tells a Story: Generating Sentences from Images ... Every Picture Tells a Story: Generating Sentences

Out of Vocabulary

22[FARHADI ET AL. 2010]

Page 23: Every Picture Tells a Story: Generating Sentences from Images · Every Picture Tells a Story: Generating Sentences from Images ... Every Picture Tells a Story: Generating Sentences

Failures

23[FARHADI ET AL. 2010]

Page 24: Every Picture Tells a Story: Generating Sentences from Images · Every Picture Tells a Story: Generating Sentences from Images ... Every Picture Tells a Story: Generating Sentences

Summary

Sentences are a descriptive and compact representation of information

This work can generate good sentences for images

The intermediate representation is crucial and allows us to look up images for sentences too

24

Page 25: Every Picture Tells a Story: Generating Sentences from Images · Every Picture Tells a Story: Generating Sentences from Images ... Every Picture Tells a Story: Generating Sentences

Future Work

Sentence model is oversimplified

Iterative procedure for better sentence understanding

Identify adjectives and adverbs once sentence is generated

25

Page 26: Every Picture Tells a Story: Generating Sentences from Images · Every Picture Tells a Story: Generating Sentences from Images ... Every Picture Tells a Story: Generating Sentences

Questions?

26[FARHADI ET AL. 2010]

Page 27: Every Picture Tells a Story: Generating Sentences from Images · Every Picture Tells a Story: Generating Sentences from Images ... Every Picture Tells a Story: Generating Sentences

References

Every Picture Tells a Story: Generating Sentences for Images. A. Farhadi, M. Hejrati, M. Sadeghi, P. Young, C. Rashtchian, J. Hockenmaier, and D. Forsyth. ECCV 2010.

Learning Realistic Human Actions from Movies. I. Laptev, M. Marszalek, C. Schmid, and B. Rozenfeld. CVPR 2008.

27