Epitomic Location Epitomic Location Recognition Recognition A generative approach for location A generative approach for location recognition recognition K. Ni, A. Kannan, A. Criminisi and J. Winn K. Ni, A. Kannan, A. Criminisi and J. Winn In proc. CVPR 2008. Anchorage, Alaska.
35
Embed
Epitomic Location Recognition A generative approach for location recognition K. Ni, A. Kannan, A. Criminisi and J. Winn In proc. CVPR 2008. Anchorage,
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Geometry Based Geometry Based RecognitionRecognition
SLAM & structure from motionSLAM & structure from motionWhy do we need Why do we need metric metric reconstruction?reconstruction?
Lose the flexibility to do class recognition.Lose the flexibility to do class recognition.
F. Schaffalitzky and A. Zisserman
Training Training ImagesImages
Testing Testing ImageImage
Geometry Geometry &Labels&Labels
FeaturesFeatures
Local Feature Local Feature DatabaseDatabase
G. Schindler, M. Brown, R. Szeliski
Appearance Based Appearance Based RecognitionRecognition
Capture global appearance informationCapture global appearance informationGaussian mixture model used by A. Gaussian mixture model used by A. Torralba, et. alTorralba, et. al
Training Training ImagesImages
ImageImageVectorsVectors
Appearance Appearance ModelModel
Preprocessing Training
A. Torralba, K. Murphy, W. T. Freeman and M. A. Rubin
M. Cummins and P. Newman
(e.g. PCA)
Appearance or Geometry?Appearance or Geometry?
Can we do better by fusing both Can we do better by fusing both information together?information together?
A small example with 2 location labels: cubicle and corridor
The Simplest ModelThe Simplest Model
Nearest neighbor classificationNearest neighbor classificationNaive but still effective with Naive but still effective with enoughenough samples.samples.
A small shift may disrupt the recognition.A small shift may disrupt the recognition.
Does not capture uncertainty.Does not capture uncertainty.
How to Incorporate Translation How to Incorporate Translation Invariance?Invariance?
We need something better than a “bag of frames” We need something better than a “bag of frames” modelmodel
Training images
Testing image
PanoramaPanorama
It models both appearance & geometryIt models both appearance & geometryAdapts to camera rotation and focal length Adapts to camera rotation and focal length changechange
M. Brown and D. G. Lowe
GenerativeGenerativeAn image is a patch “extracted” from the panoramaAn image is a patch “extracted” from the panorama
Cons of PanoramasCons of PanoramasNot easy to build a panorama due to Not easy to build a panorama due to parallaxparallax
Do not capture uncertaintyDo not capture uncertainty
Only work for location Only work for location instanceinstance recognition recognition
No compact representation for repetitive No compact representation for repetitive scenesscenes
Gaussian Mixture ModelGaussian Mixture Model
Six mixtures trained as in Torralba et al’s Six mixtures trained as in Torralba et al’s paperpaper
Handles uncertainties but no translation Handles uncertainties but no translation invarianceinvariance
Means
Variances
Remove boundariesMuch more blurred
A Weak PanoramaA Weak Panorama
3D motions can be roughly modeled by 3D motions can be roughly modeled by 2D translation + scaling.2D translation + scaling.
2D translation
Scaling
Epitome = Panorama + Epitome = Panorama + GMMGMM
EpitomeEpitomeGenerative model for image patches /video Generative model for image patches /video framesframes
Captures repetitive patterns in the original Captures repetitive patterns in the original imageimage
EM IterationsEM IterationsE-step: infer the posteriors over all E-step: infer the posteriors over all mappingsmappings
M-step: use the posteriors as weights to update M-step: use the posteriors as weights to update the the meanmean and and variancevariance of epitome pixels of epitome pixels
Free energy EM iterations
Model ComparisonModel ComparisonEpitome is a smart mixture of Gaussians model with parameters sharing among components
For the same number of parameters, the epitome generalizes better
Build Label MapsBuild Label MapsThe label maps are the posterior of the The label maps are the posterior of the label given the mapping label given the mapping
Epitome
Label maps
Corridor label map
Cubicle label map
Recognition from Location Recognition from Location EpitomesEpitomes
Fast correlation: infer the best mapping region
Sum the pixel-wise votes
Temporal smoothing using HMMInput testing image Best matching patch
Corridor label map
Cubicle label map
Location epitome
Color is not always the best Color is not always the best featurefeature
Other features besides RGBOther features besides RGBFor example, stereo feature captures the For example, stereo feature captures the depth info.depth info.
Do not need high stereo accuracy (efficient DP Do not need high stereo accuracy (efficient DP here)here)
Enable better translation invariance and more Enable better translation invariance and more generalizationgeneralization
Error rate: Error rate: 0.49 0.49 0.36 0.36 in a test, 4-class datasetin a test, 4-class dataset
Improve the efficiency dramatically: Improve the efficiency dramatically: 30 30 times speed-times speed-upup
Supervised LearningSupervised LearningIncorporates training image labelsIncorporates training image labels
Helps discriminate images with similar Helps discriminate images with similar features but different location labels.features but different location labels.
An example epitome
An example label feature
A monitor in the cubicle
A microwave in the kitchen
Discriminative features
MIT Image DatabaseMIT Image Database
Created by Created by Antonio Torralba, and et. al.Antonio Torralba, and et. al.17 sequences, 62 locations, 7 categories, 17 sequences, 62 locations, 7 categories, 72077 images72077 images
Results on Recognizing Location Results on Recognizing Location InstancesInstances
Location epitome vs. GMM, 10% better in Location epitome vs. GMM, 10% better in averageaverage
Results on Recognizing Location Results on Recognizing Location ClassesClasses
Location Epitome vs. GMM, 10%-20% Location Epitome vs. GMM, 10%-20% betterbetter
MSRC Data SetMSRC Data Set
Captured with a Captured with a stereo camerastereo camera5409 images collected at the speed of 5409 images collected at the speed of 44 fps fps
11 sequences and 7 classes11 sequences and 7 classes
Instance Recognition with Multiple Instance Recognition with Multiple FeaturesFeatures
RGB & Stereo overwhelms the other RGB & Stereo overwhelms the other featuresfeatures
Learning: Learning: 5.75.7 fps fps
Recognition: Recognition: 116 116 fps = fps = 2929 times the capture times the capture speedspeed
SummarySummaryA generative model for the recognition of A generative model for the recognition of both location both location instancesinstances and and classesclasses
FastFast: capable of real-time applications: capable of real-time applications
FlexibleFlexible: capable of integrating various features: capable of integrating various features
ProbabilisticProbabilistic: capable of capturing : capable of capturing uncertaintiesuncertainties
Future applicationsFuture applicationsNavigation for visually impaired peopleNavigation for visually impaired people
Appearance-based loop closing for SLAM Appearance-based loop closing for SLAM problemsproblems