Top Banner
Every Picture Tells a Story: Generating Sentences from Images M.Sc. Seminar: Recent Developments in Computational Semantics Nikolina Koleva Saarland University Department of Computational Linguistics December 9, 2013 Nikolina Koleva (CoLi Saarland) Generating Sentences from Images December 9, 2013 1 / 24
26

Every Picture Tells a Story: Generating Sentences from Images...Every Picture Tells a Story: Generating Sentences from Images M.Sc. Seminar: Recent Developments in Computational Semantics

Oct 04, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Every Picture Tells a Story: Generating Sentences from Images...Every Picture Tells a Story: Generating Sentences from Images M.Sc. Seminar: Recent Developments in Computational Semantics

Every Picture Tells a Story:Generating Sentences from Images

M.Sc. Seminar: Recent Developments in Computational Semantics

Nikolina Koleva

Saarland UniversityDepartment of Computational Linguistics

December 9, 2013

Nikolina Koleva (CoLi Saarland) Generating Sentences from Images December 9, 2013 1 / 24

Page 2: Every Picture Tells a Story: Generating Sentences from Images...Every Picture Tells a Story: Generating Sentences from Images M.Sc. Seminar: Recent Developments in Computational Semantics

Overview

1 Motivation

2 ApproachMapping Image to Meaning as MRFNode and Edge potentialsLearning and Inference

3 Evaluation

4 Conclusion

Nikolina Koleva (CoLi Saarland) Generating Sentences from Images December 9, 2013 2 / 24

Page 3: Every Picture Tells a Story: Generating Sentences from Images...Every Picture Tells a Story: Generating Sentences from Images M.Sc. Seminar: Recent Developments in Computational Semantics

Motivation

Motivation

Humans are able to provide concise description of a picture thatfocuses on the most important depicted parts.The descriptions are accurate and with good agreement.

Nikolina Koleva (CoLi Saarland) Generating Sentences from Images December 9, 2013 3 / 24

Page 4: Every Picture Tells a Story: Generating Sentences from Images...Every Picture Tells a Story: Generating Sentences from Images M.Sc. Seminar: Recent Developments in Computational Semantics

Motivation

Motivation

Humans are able to provide concise description of a picture thatfocuses on the most important depicted parts.The descriptions are accurate and with good agreement.

Nikolina Koleva (CoLi Saarland) Generating Sentences from Images December 9, 2013 3 / 24

Page 5: Every Picture Tells a Story: Generating Sentences from Images...Every Picture Tells a Story: Generating Sentences from Images M.Sc. Seminar: Recent Developments in Computational Semantics

Motivation

Motivation

Hypothesis

Automatic methods can do so, too.

GoalDemonstration of automatic correlation of a description to a givenimage and vice versa.

Nikolina Koleva (CoLi Saarland) Generating Sentences from Images December 9, 2013 4 / 24

Page 6: Every Picture Tells a Story: Generating Sentences from Images...Every Picture Tells a Story: Generating Sentences from Images M.Sc. Seminar: Recent Developments in Computational Semantics

Motivation

Motivation

Given Retrieve

A baby secured in a chair.

A man on a motorbike jumpingwith the sky behind him.

Nikolina Koleva (CoLi Saarland) Generating Sentences from Images December 9, 2013 5 / 24

Page 7: Every Picture Tells a Story: Generating Sentences from Images...Every Picture Tells a Story: Generating Sentences from Images M.Sc. Seminar: Recent Developments in Computational Semantics

Approach

General Idea

• Meaning represented as triplets: <object, action, scene>

learn the projections from the image and sentence spaces to the meaningspace

Nikolina Koleva (CoLi Saarland) Generating Sentences from Images December 9, 2013 6 / 24

Page 8: Every Picture Tells a Story: Generating Sentences from Images...Every Picture Tells a Story: Generating Sentences from Images M.Sc. Seminar: Recent Developments in Computational Semantics

Approach Mapping Image to Meaning as MRF

Mapping Image to Meaning

Solve a (small) multi-label Markov Random Field (MRF) to predict the tripletof an image

A, O and S have sets of discrete values

Nikolina Koleva (CoLi Saarland) Generating Sentences from Images December 9, 2013 7 / 24

Page 9: Every Picture Tells a Story: Generating Sentences from Images...Every Picture Tells a Story: Generating Sentences from Images M.Sc. Seminar: Recent Developments in Computational Semantics

Approach Node and Edge potentials

Image Node Potentials

1 Image features → linear combination of scores from different detectorand classification responses

• detector responses (Falzenwalb et al.)• classification responses (Hoiem et al.)• Gist-based scene classification responses (Oliva et al.)

2 Node features → predicted independently for each node bydiscriminative classifier (a linear SVM) given the image features

• number-of-nodes-dimensional vector• each element gives score of for a node given an image

Nikolina Koleva (CoLi Saarland) Generating Sentences from Images December 9, 2013 8 / 24

Page 10: Every Picture Tells a Story: Generating Sentences from Images...Every Picture Tells a Story: Generating Sentences from Images M.Sc. Seminar: Recent Developments in Computational Semantics

Approach Node and Edge potentials

Image Similarity

1 obtaining the KNNs in the training set for a test imageby matching (1) image features and (2) node features derived fromclassifiers and detectors

2 computing the average of the node features over those neighboursimage side → what are the node features :sentence side → what does the sentence representation:(1) for similar images (2) for images that produce similar classifier anddetector output

Nikolina Koleva (CoLi Saarland) Generating Sentences from Images December 9, 2013 9 / 24

Page 11: Every Picture Tells a Story: Generating Sentences from Images...Every Picture Tells a Story: Generating Sentences from Images M.Sc. Seminar: Recent Developments in Computational Semantics

Approach Node and Edge potentials

Sentence Similarity Measures

Compute the similarity of a sentence and the triplets.

1 compute dependency parses for each sentence

2 extract triplets of sentencesobject and action for a sentence: extract subj, direct obj and anynmod with a noun and a verbscene information: head nouns of the prepositional phrases (except "of"and "with")

3 Lin’s similarity applied on objects and scenes

4 compute action co-occurrence scoresdetect similar verbs by checking if they appear in different captions forthe same image

5 estimate sentence node potentials based on the measures 1-4

Nikolina Koleva (CoLi Saarland) Generating Sentences from Images December 9, 2013 10 / 24

Page 12: Every Picture Tells a Story: Generating Sentences from Images...Every Picture Tells a Story: Generating Sentences from Images M.Sc. Seminar: Recent Developments in Computational Semantics

Approach Node and Edge potentials

Sentence Node Potentials

• sentence node feature: similarity of each object, scene and action

• average of sentence node features for the other 4 captions

• KNN average of sentence node features

• average of the image node features for images of the neighbours

• average of the sentence node features of reference sentences for theneighbours

Nikolina Koleva (CoLi Saarland) Generating Sentences from Images December 9, 2013 11 / 24

Page 13: Every Picture Tells a Story: Generating Sentences from Images...Every Picture Tells a Story: Generating Sentences from Images M.Sc. Seminar: Recent Developments in Computational Semantics

Approach Node and Edge potentials

Edge Potentials

Defined as:Linear combination of several estimates from node A to node B.

• the normalized frequency of the word A in the corpus, f (A)

• the normalized frequency of the word B in the corpus, f (B)

• the normalized frequency of A and B in the corpus, f (A,B)

• the ratio f (A,B)f (A)·f (B)

Nikolina Koleva (CoLi Saarland) Generating Sentences from Images December 9, 2013 12 / 24

Page 14: Every Picture Tells a Story: Generating Sentences from Images...Every Picture Tells a Story: Generating Sentences from Images M.Sc. Seminar: Recent Developments in Computational Semantics

Approach Learning and Inference

Learning and Inference

Triple prediction for pictures

discriminative learning based on a labeled training set

Learning the mapping from images to meaning:

find the set of weights of linear combinations of feature functions thatmaximize the ground truth triplets scores

Inference: search for a triple that gives the best score

additive: argmaxy wTφ(xi ,y)multiplicative: argmaxy

∏wTφ(xi ,y)

φ the potential functiony is the triplet labelxi the i-th image

Nikolina Koleva (CoLi Saarland) Generating Sentences from Images December 9, 2013 13 / 24

Page 15: Every Picture Tells a Story: Generating Sentences from Images...Every Picture Tells a Story: Generating Sentences from Images M.Sc. Seminar: Recent Developments in Computational Semantics

Evaluation

Evaluation

• PASCAL Sentence Dataset:random selection of 50 images of 20 categoriesuntil set size 1000 images

• annotating each image with 5 sentences, resulting in 5000 sentences

• manual assignment of triplets173 different triplets in the training set (600 images)123 different triplets in the test set (400 images)overlap: 80 triplets

• 15 nearest neighbours for building the potentials for images andsentences

• 50 closest triples used by the matching

Nikolina Koleva (CoLi Saarland) Generating Sentences from Images December 9, 2013 14 / 24

Page 16: Every Picture Tells a Story: Generating Sentences from Images...Every Picture Tells a Story: Generating Sentences from Images M.Sc. Seminar: Recent Developments in Computational Semantics

Evaluation

Evaluation

• scoring a match between an image and a sentence:ranking of k top triplets in the opposite spacetake the sum of ranks weighted by the inverse rank→ low score = high similarity

Nikolina Koleva (CoLi Saarland) Generating Sentences from Images December 9, 2013 15 / 24

Page 17: Every Picture Tells a Story: Generating Sentences from Images...Every Picture Tells a Story: Generating Sentences from Images M.Sc. Seminar: Recent Developments in Computational Semantics

Evaluation

Evaluation

• out of vocabulary words handled by distributional semantics methodsunseen words are estimated during training by semantic similarity

Nikolina Koleva (CoLi Saarland) Generating Sentences from Images December 9, 2013 16 / 24

Page 18: Every Picture Tells a Story: Generating Sentences from Images...Every Picture Tells a Story: Generating Sentences from Images M.Sc. Seminar: Recent Developments in Computational Semantics

Evaluation

Evaluation

• out of vocabulary words handled by distributional semantics methodsunseen words are estimated during training by semantic similarity

Nikolina Koleva (CoLi Saarland) Generating Sentences from Images December 9, 2013 16 / 24

Page 19: Every Picture Tells a Story: Generating Sentences from Images...Every Picture Tells a Story: Generating Sentences from Images M.Sc. Seminar: Recent Developments in Computational Semantics

Evaluation

Quantitative Measures

• Tree-F1 measure: reflects accuracy and specificityusing taxonomy trees: Object → Animal → Cat

• standard F1 measureprecision: total # of matching edges with ground truthrecall: total # of edges in the predicted path

• BLUE Measure: checks if a triplet is logically valid or note.g. <bottle, walk, street> is not valid

Nikolina Koleva (CoLi Saarland) Generating Sentences from Images December 9, 2013 17 / 24

Page 20: Every Picture Tells a Story: Generating Sentences from Images...Every Picture Tells a Story: Generating Sentences from Images M.Sc. Seminar: Recent Developments in Computational Semantics

Evaluation

Results for image → meaning space

Obj: consider only obj potentialsNo Edge: uniform potentials over edgesA: additive inference modelM: multiplicative inference modelFW: fixed weightsSL: structured learning

Nikolina Koleva (CoLi Saarland) Generating Sentences from Images December 9, 2013 18 / 24

Page 21: Every Picture Tells a Story: Generating Sentences from Images...Every Picture Tells a Story: Generating Sentences from Images M.Sc. Seminar: Recent Developments in Computational Semantics

Evaluation

Results for sentence generation

2 annotators for quality: 208 of 400 images have at least one of ten accuratesentence

Nikolina Koleva (CoLi Saarland) Generating Sentences from Images December 9, 2013 19 / 24

Page 22: Every Picture Tells a Story: Generating Sentences from Images...Every Picture Tells a Story: Generating Sentences from Images M.Sc. Seminar: Recent Developments in Computational Semantics

Evaluation

Retrieve images for sentences

Nikolina Koleva (CoLi Saarland) Generating Sentences from Images December 9, 2013 20 / 24

Page 23: Every Picture Tells a Story: Generating Sentences from Images...Every Picture Tells a Story: Generating Sentences from Images M.Sc. Seminar: Recent Developments in Computational Semantics

Evaluation

Failure examples

Nikolina Koleva (CoLi Saarland) Generating Sentences from Images December 9, 2013 21 / 24

Page 24: Every Picture Tells a Story: Generating Sentences from Images...Every Picture Tells a Story: Generating Sentences from Images M.Sc. Seminar: Recent Developments in Computational Semantics

Conclusion

Summary

1 Image, Sentence and Meaning Space

2 leaning the projections of the image and sentence space to thetriples in the meaning space

3 annotation of 1000 images each with five sentences as captions

4 meaning space used for generation of appropriate sentencesgiven an image and retrieving images for a given sentence

5 out of vocabulary words handled with word similarity based onwords distribution

Nikolina Koleva (CoLi Saarland) Generating Sentences from Images December 9, 2013 22 / 24

Page 25: Every Picture Tells a Story: Generating Sentences from Images...Every Picture Tells a Story: Generating Sentences from Images M.Sc. Seminar: Recent Developments in Computational Semantics

Conclusion

Thank you for your attention!

Any questions?

Nikolina Koleva (CoLi Saarland) Generating Sentences from Images December 9, 2013 23 / 24

Page 26: Every Picture Tells a Story: Generating Sentences from Images...Every Picture Tells a Story: Generating Sentences from Images M.Sc. Seminar: Recent Developments in Computational Semantics

Conclusion

References

Ali Farhadi, Seyyed Mohammad Mohsen Hejrati, Mohammad AminSadeghi, Peter Young, Cyrus Rashtchian, Julia Hockenmaier, andDavid A. Forsyth.Every picture tells a story: Generating sentences from images.In ECCV (4), pages 15–29, 2010.

http://vision.cs.uiuc.edu/pascal-sentences/

Nikolina Koleva (CoLi Saarland) Generating Sentences from Images December 9, 2013 24 / 24