Top Banner
Who’s Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation Presenter: Maresh Naresh Singh Authors: Luo Ji, Barbara Caputo and Vittorio Ferrari
25

Who’s Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation

Feb 22, 2016

Download

Documents

anson

Who’s Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation. Authors: Luo Ji , Barbara Caputo and Vittorio Ferrari. Presenter: Maresh Naresh Singh. Aim. Given: N ews items consisting of images with their associated text. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Who’s Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation

Who’s Doing What: Joint Modeling of Names and Verbs for Simultaneous Face

and Pose Annotation

Presenter: Maresh Naresh Singh

Authors: Luo Ji, Barbara Caputo and Vittorio Ferrari

Page 2: Who’s Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation

Aim

• Given: News items consisting of images with their associated text.

• Goal: Figure out who is doing what?

Page 3: Who’s Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation

Who is doing what?

• Guess possible action of a person in the image.

• Use pose as well as verb for this purpose.• Associate actions with the person in the

image.• Predict the name of the person.

Page 4: Who’s Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation

(b) US Democratic presidential candidate Senator Barack Obama waves to supporters together with his wife Michelle Obama standing beside him at his North Carolina and Indiana primary election night rally in Raleigh.

Page 5: Who’s Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation

(a) Four sets ... Roger Federer preparesto hit a backhand in a quarter-final matchwith Andy Roddick at the US Open.

Page 6: Who’s Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation

Correspondence ambiguity problem.

• Multiple persons in the image and captions.• Person in the image but not mentioned in the

caption.• Mention in the caption but not present in the

image.

Page 7: Who’s Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation

Idea

• The title “Joint Modeling of Names and Verbs for

Simultaneous Face and Pose Annotation”

Page 8: Who’s Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation

Generative Model

• Observed variables: Names and verbs in the caption. Detected persons in the image.

• Latent Variables: Image-caption correspondence.

• Parameters: Visual appearances of face and pose classes corresponding to different names and verbs.

• EM to compute hidden variables.

Page 9: Who’s Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation
Page 10: Who’s Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation

Features

Page 11: Who’s Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation

Face and pose recognition

• Uses face detector and upper body detector.• Face and upper-body are considered to belong

to same person if the face lies in the center of upper-body bounding box.

Page 12: Who’s Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation

Name-Verb pair.

• Language parser extracts name-verb pair from each caption.

• Uses OpenNLP.

Page 13: Who’s Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation

Summary from last class.

Page 14: Who’s Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation

Probability Function

• Uses EM to maximize the above function.

Page 15: Who’s Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation

• Maximizing the previous equation somehow boils down to minimizing the equation:

Page 16: Who’s Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation

EM algorithm (Initialization)

• Compute distance matrix between faces/poses from images sharing some name/verb in the caption.

• For each name/verb pair, select all captions containing only that name/verb.

• If the corresponding images contain only one person, their faces/poses are used to initialize the center vectors

• If the corresponding images always contain multiple players, assign person by random selection.

Page 17: Who’s Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation

EM algorithm (E-Step)

Page 18: Who’s Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation

EM algorithm (M-Step)

Page 19: Who’s Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation

Experiment and observations

Page 20: Who’s Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation

Experiment and observations

Page 21: Who’s Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation
Page 22: Who’s Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation
Page 23: Who’s Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation

Comments

• Better results on the chosen dataset.• Somewhat successful in recognizing persons in

images without captions.

Page 24: Who’s Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation

Comments

• Assumes independence between persons in an image.

• Limited dataset of 1610 images used for experimentation.

• Manual involvement in writing captions.• Images collected using search queries like

“Barack Obama” + “Shake hands”• Such queries results in images with strong

correspondence between pose and face.

Page 25: Who’s Doing What: Joint Modeling of Names and Verbs for Simultaneous Face and Pose Annotation

Thanks for tolerating me.