Finding Better Answers in Video Using Pseudo Relevance Feedback Informedia Project Carnegie Mellon University Carnegie Mellon Question Answering from Errorful Multimedia Streams ARDA AQUAINT
Finding Better Answers in Video Using Pseudo Relevance Feedback
Informedia ProjectCarnegie Mellon University
Carnegie Mellon
Question Answering from Errorful Multimedia Streams
ARDA AQUAINT
2Carnegie Mellon
Outline
• Pseudo-Relevance Feedback for Imagery
• Experimental Evaluation
• Results
• Conclusions
3Carnegie Mellon
Motivation
• Question Answering from multimedia streams • Questions contain text and visual components• Want a good image that represents the ‘answer’
• Improve performance of images retrieved as answers
• Relevance feedback works for text retrieval !
4Carnegie Mellon
Finding Similar Images by Color
5Carnegie Mellon
Finding Similar Scenes
6Carnegie Mellon
Similarity Challenge: Images containing similar content
7Carnegie Mellon
What is Pseudo Relevance Feedback
• Relevance Feedback (Human intervention)
QUERY SYSTEM RESULTS
RelevanceJudgmentHUMAN
Feedback
• Why Pseudo?
QUERY SYSTEM RESULTS
Feedback without human intervention
8Carnegie Mellon
Original System Architecture
• Simply weighted linear combination of video, audio and text retrieval score
RetrievalAgents
Query
Text Image
Text Score Image Score
Final Score
9Carnegie Mellon
System Architecture with PRF• New step:
• Classification through Pseudo Relevance Feedback (PRF)• Combine with all other information agents (text, image)
Query
Text Image
Final Score
RetrievalAgents
Text Score
ImageScore
PRFScore
10Carnegie Mellon
Classification from Modified PRF
• Automatic retrieval technique
• Modification: use negative data as feedback
• Step-by-step• Run base retrieval algorithm on image collection
• K-Nearest neighbor (KNN) on color and texture• Build classifier
• Negative examples: least relevant images in the collection• Positive examples: image queries
• Classify all data in the collection to obtain ranked results
11Carnegie Mellon
The Basic PRF Algorithm for Image RetrievalInput
Query Examples q1 … qn
Target Examples t1 … tn=========================Output
Final score Fi and final ranking for every target ti
=========================Algorithm
Given initial score s0i for each ti based on f0(ti, q1 … qn)
Using an initial similarity measure f0 as a base
Iterate k = 1 … max
Given score ski, sample positive instances pk
i and negative instances nki
using sampling strategy S
Compute updated retrieval score sik+1= fi
k+1(ti) where fik+1 is trained/learned
using nki,pk
i
Combine all scores for final score Fi =g(s0 … smax)
12Carnegie Mellon
Analysis: PRF on Synthetic Data
13Carnegie Mellon
PRF on Synthetic Data
14Carnegie Mellon
Evaluation using the 2002 TREC Video Retrieval Task
• Independent collection, queries, relevant results available
• Search Collection• Total Length: 40.16 hours• MPEG-1 format• Collected from Internet Archive and Open Video websites,
documentaries from the ‘50s• 14,000 shots• 292,000 I-frames (images)
• Query • 25 queries• Text, Image(Optional), Video(Optional)
Summary of ’02 Video QueriesQ ID Query Text # videos #ext images
75 Eddie Rickenbacker 2 276 James Chandler 377 George Washington 1 178 Abraham Lincoln 1 179 leisure /beach/people 480 musicians playing instruments 281 football players 482 women in long dresses 383 Golden Gate Bridge 584 Price Tower 185 Washington Square arch in NY 186 city views from above 487 oil fields rigs, oil drilling 188 maps of the United states 489 living butterfly 190 snow covered mountains 391 parrots 1 192 sailboats 2 493 beef, cattle, cows 594 people in city 395 nuclear explosion mushroom cloud 396 U.S. flag 297 living cells in microscopic view 298 locomotive 599 missile/rocket launch 2
16Carnegie Mellon
Analysis of Queries (2002)
• Specific item or person• Eddie Rickenbacker, James Chandler, George Washington, Golden
Gate Bridge, Price Tower in Bartlesville, OK
• Specific fact• Arch in Washington Square Park in NYC, map of continental US
• Instances of a category• football players, overhead views of cities, one or more women
standing in long dresses
• Instances of events/activities• people spending leisure time at the beach, one or more musicians
with audible music, crowd walking in an urban environment, locomotive approaching the viewer
17Carnegie Mellon
Sample Query and Target
Query: Find pictures of Harry Hertz, Director
of the National Quality Program, NIST
18Carnegie Mellon
Sample Query and Target
Query: Find pictures of Harry Hertz, Director
of the National Quality Program, NIST
Speech: We’re looking for people that have a broad range of expertise that have business knowledge that have knowledge on quality management on quality improvement and in particular …
OCR:H,arry Hertz a Director aro 7 wa-,i,,ty Program,Harry Hertz a Director
19Carnegie Mellon
Example Images
20Carnegie Mellon
Example Images Selected for PRF
21Carnegie Mellon
Combination of Agents
• Multiple Agents• Text Retrieval Agent• Base Image Retrieval Agent
• Nearest Neighbor on Color
• Nearest Neighbor on Texture• Classification PRF Agent
• Combination of multiple agents• Convert scores to posterior probability • Linear combination of probabilities
22Carnegie Mellon
2002 Results
*Video OCR was not relevant in this collection
Method Precision Recall MAP
Speech Transcripts only 0.0348 0.1445 0.0724
SR + Color/Texture 0.0892 0.220 0.1046
SR + Color/Texture + PRF 0.0924 0.216 0.1124
23Carnegie Mellon
Distance Function for Query 75
24Carnegie Mellon
Distance Function for Query 89
25Carnegie Mellon
Effect of Pos/Neg Ratio and Combination Weight
26Carnegie Mellon
Selection of Negative Images + Combination
27Carnegie Mellon
Discussion & Future Work
• Discussion• Results are sensitive to queries with small numbers of answers• Images alone cannot fully represent the query semantics
• Future Work• Incorporate more agents• Utilize the relationship between multiple agent information• Better combination scheme• Include web image search (e.g. Google) as query expansion
28Carnegie Mellon
Conclusions
• Pseudo-relevance feedback works for text retrieval
• This is not directly applicable to image retrieval from video due to low precision in the top answers
• Negative PRF was effective for finding better images