Demo
Example
Spatial re-ranking• improves precision
• but not recall …
Query images Prec.
Rec.
• high precision at low recall (like google)
• variation in performance over query
• none retrieve all instances
Obtaining visual words is like a sensor measuring the image
“noise” in the measurement process means that some visual words are missing or incorrect, e.g. due to
• Missed detections• Changes beyond built in invariance• Quantization effects
Consequence: Visual word in query is missing in target image
Why aren’t all objects retrieved?
Clustered and quantized to visual words
Clustered and quantized to visual words
sparse frequency vector
Set of SIFTdescriptorsquery image
Hessian-Affine regions + SIFT
descriptors
Hessian-Affine regions + SIFT
descriptors
1. Query expansion2. Better quantization
Query expansion
In text :• Reissue top N results as queries
• Pseudo/blind relevance feedback
• Danger of topic drift – this is a big problem for text
Query Expansion
Original query: Hubble Telescope Achievements
Example from: Jimmy Lin, University of Maryland
Query expansion: Select top 20 terms from top 20 documents according to tf-idf
Telescope, hubble, space, nasa, ultraviolet, shuttle, mirror, telescopes, earth, discovery, orbit, flaw, scientists, launch, stars, universe, mirrors, light, optical, species
Added terms:
Query Expansion: Text
In text :• Reissue top N results as queries
• Pseudo/blind relevance feedback
• Danger of topic drift – this is a big problem for text
In vision:• Reissue spatially verified image regions as queries
• Spatial verification like an oracle of truth
Query Expansion
Visual query expansion - overview
1. Original query
3. Spatial verification
4. New enhanced query
…
2. Initial retrieval set
5. Additional retrieved images
oracle
Query Image Originally retrieved image Originally not retrieved
What Query Expansion Adds
What Query Expansion Adds
What Query Expansion Adds
What Query Expansion Adds
Visual query expansion - overview
1. Original query
3. Spatial verification
4. New enhanced query
…
2. Initial retrieval set
5. Additional retrieved images
oracle
Bag of visual words particular object retrieval
Hessian-Affineregions + SIFT descriptors
visual words+tf-idf weighting
querying
sparse frequency vector
centroids(visual words)
Invertedfile
ranked imageshort-list
Set of SIFTdescriptorsquery image
[Lowe 04, Chum & al 2007]
Geometricverification
[Chum & al 2007]
Queryexpansion
Demo
Query image Originally retrieved Retrieved only after expansion
Query Expansion
Queryimage
Expanded results (better)
Original results (good)
Prec.
Prec.
Rec.
Rec.
Better Quantization
Problems arising from quantization
• Typically, quantization has a significant impact on the final performance of the system [Sivic03,Nister06,Philbin07]
• Quantization errors split features that should be grouped together and confuse features that should be separated
Voronoicells
And more …
i. Points 3 and 4 are close, but never matched
i. Points 1, 2 and 3 are matched equally
Overcoming quantization errors• Soft-assign each descriptor to multiple cluster centers• Assignment weight according to Gaussian on distance• Normalize weights to sum to one
[Philbin et al. CVPR 2008, Van Gemert et al. ECCV 2008]
A: 0.1B: 0.5C: 0.4
B: 1.0 Hard Assignment
Soft Assignment
Learning a vocabulary to overcome quantization errors[Mikulik et al. ECCV 2010, Philbin et al. ECCV 2010]
Several other solutions are possible …
e.g. Hamming embedding [Jegou&Schmid ECCV 2008]• Standard quantization using bag-of-visual-words• Additional localization in the Voronoi cell by a binary signature
• More on methods of soft assignment tomorrow
Soft Assignment: Implementation
Bag of words: score a match between two features by the scalar product of their weight vectors
Spatial re-ranking: also score the number of inliers using this measure
Soft Assignment: ResultsBenefit 1: Helping Query Expansion
Query
HardAssignment
Only one good initial result – QE doesn't significantly improve results
Soft Assignment: ResultsBenefit 1: Helping Query Expansion
Query
SoftAssignment
4 good results – allows query expansion to return these results
in addition to the ones above
Soft Assignment: ResultsBenefit 2: Better spatial localization
HardAssignment
SoftAssignment
Results: Baseline to State of the Art
1. Baseline Method K = 10K 0.389
2. Large Vocabulary K=1M 0.618
3. Spatial Re-ranking 0.653
4. Soft Assignment (SA) 0.731
5. Query Expansion (QE) 0.801
Mean Average Precision
6. SA & QE 0.825
Disadvantages of soft assignment?
Outline
1. Object recognition cast as nearest neighbour matching
2. Object recognition cast as text retrieval
3. Large scale search and improving performance
4. Applications
• accessing expert knowledge, data mining, inpainting, location search, large scale reconstruction, mobile apps, …
5. The future and challenges
Application
Accessing expert knowledge:
• Use an image query to access an annotated dataset
• Search with query image retrieve annotation
Visual Access to Classical Art Archives
Currently: 111 thousand Greek vase images
http://explore.clarosnet.org/XDB/ASP/clarosHome/
Application:
Object Mining in Large Datasets
Objective …
Automatically find and group images of same object/scene
Motivation
Applications:
• Dataset summarization
• Efficient retrieval
• Efficient pre-processing for automatic 3-D reconstruction (e.g. PhotoSynth)
Matching Graph
Build a ‘matching graph’ over all the images in the dataset
Each image is a node and a link represents two images having some object in common
Given this graph structure, apply various clustering algorithms to group the data
Finding Commonly Occurring Objects
Simple idea: strong spatial constraints gives a 'link' between two images
Edge strength = # inliers
Finding Commonly Occurring ObjectsUse these links to build up a graph over all images in the
dataset
Nodes = images, edges = spatially verified matches
Building the Matching Graph
• Use each image to query the dataset
• Each query gives a list of results scored by a measure of the spatial consistency to the query
• Threshold this consistency measure to determine the links in the matching graph
Connected Components
In a collection of images of multiple disjoint objects we expect the matching graph to also be disjoint
A simple first step is to take connected components of the matching graph and examine the clusters returned
Connected Components
Example: five connected components from the Oxford dataset
56 images 71 images 26 images 25 images 56 images
Connected ComponentsA problem with connected components is that ‘connecting images’ can sometimes join two disjoint objects
LinkingImages
Can overcome this problem by divide and merge strategy
Datasets
Statue of Liberty dataset (37,034 images)• Crawled from Flickr by querying for ‘statue of
liberty’• Lots of images of the Statue of Liberty but also of
New York and other sites
Rome dataset (1,021,986 images) [1]• Again, crawled from Flickr• Contains too much stuff to mention
[1] Photo tourism: Exploring photo collections in 3D, Noah Snavely, Steven M. Seitz, Richard Szeliski
Results: Statue of Liberty
Largest cluster – 8461 images of the Statue of Liberty
2nd largest – 276 aerial views of New York
Results: Statue of Liberty
3rd largest – 80 American flags
Results: Statue of Liberty
Results: Statue of Liberty
Smaller clusters
Lego Statue of Liberty 59 images
Staten Island 52 images
Results: Rome
18676 images
15818 images
9632 images
4869 images
Timings
21,339 high resolution images from Flickr tagged with 'statue of liberty'
Querying with every image in the database to build the graph takes ~2 hours
Finding connected components (v quick) using a threshold of 20 spatially verified inliers gives 11 clusters with more than 20 images
As an aside …Better matching with fewer features[Turcot & Lowe, ICCV Workshop 2009].
• Build matching graph • Augment image bag-of-word histograms using neighbours
• Like query expansion, but done in advance on the `server side’
Application: Internet-based inpaintingPhoto-editing using images of the same place[Whyte, Sivic and Zisserman, 2009], but see also [Hays and Efros, 2007].
Application: place recognition (retrieval in a structured (on a map) database)
[Knopp, Sivic, Pajdla, ECCV 2010] http://www.di.ens.fr/willow/research/confusers/
Query
Query Expansion(Panoramio, Flickr, … )
Best match
Image indexingwith spatial verification
Optimized image database
ConfuserSuppressionOnly negative training data
(from geotags)
Image database
17
Correctly recognized examples
More correctly recognized examples
Application: Matching and 3D reconstruction in large unstructured datasets.
Building Rome in a Day, SameerAgarwal, Noah Snavely, Ian Simon, Steven M. Seitz and Richard Szeliski,International Conference on Computer Vision, 2009http://grail.cs.washington.edu/rome/
See also [Havlena, Torrii, Knopp and Pajdla, CVPR 2009].Figure: N. Snavely
Example of the final 3D point cloud and cameras57,845 downloaded images, 11,868 registered images. This video: 4,619 images.
The Old City of Dubrovnik
Bing visual scan
Application: Mobile visual search apps
and others… Snaptell.com, Moodstocks.com
ExampleExample
Slide credit: I. Laptev
Sivic, J. and Zisserman, A.Video Google: A Text Retrieval Approach to Object Matching in VideosProceedings of the International Conference on Computer Vision (2003)http://www.robots.ox.ac.uk/~vgg/publications/papers/sivic03.pdf
Demo: http://www.robots.ox.ac.uk/~vgg/research/vgoogle/
Chum, O., Philbin, J., Isard, M., Sivic, J. and Zisserman, A.Total Recall: Automatic Query Expansion with a Generative Feature Model for Object RetrievalProceedings of the International Conference on Computer Vision (2007)http://www.robots.ox.ac.uk/~vgg/publications/papers/chum07b.pdf
Demo: http://www.robots.ox.ac.uk/~vgg/research/oxbuildings/
Philbin, J. and Zisserman, A.Object Mining using a Matching Graph on Very Large Image Collections Proc. of the Indian Conference on Vision, Graphics and Image Processing (2008)http://www.robots.ox.ac.uk/~vgg/publications/papers/philbin08b.pdf
Papers and Demos