Outline
'Big Data' in Computer VisionMap/Reduce and Computer VisionMap/Reduce Image SearchApplication: Screenshot Retrieval
'Big Data' in Vision
Traditional Vision: Focus on the modelPose Est.: 2D Image -> Virtual 3D model + Camera
Under-constrained, slow, sensitive to noiseObject Recognition: SVM + features
Breaks with many classes (e.g., every flickr tag)
New Trend: Focus on the dataDB of images (w/ metadata) -> query imageProblem becomes similar image searchTransfer metadata from DB images to query imageKNN methods simple and scalable
Clustering, hashing, metric learning
NLP: rule-based models -> statistical models
Example: Image Search -> MetadataQuery Image
Example: Image Search -> MetadataQuery Image Retrieved Images (flickr)
TagsLocation (GPS)TitleDateGroupsCommentsOwnerViews
TagsLocation (GPS)TitleDateGroupsCommentsOwnerViews
TagsLocation (GPS)TitleDateGroupsCommentsOwnerViews
Example: Image Search -> MetadataQuery Image Retrieved Images (flickr)
TagsLocation (GPS)TitleDateGroupsCommentsOwnerViews
TagsLocation (GPS)TitleDateGroupsCommentsOwnerViews
TagsLocation (GPS)TitleDateGroupsCommentsOwnerViews
Output Metadata
TagsLocation (GPS)
Big Data in Vision: Pose EstimationGoal: Given an image of a person, estimate 3D pose.
G. Shakhnarovich, P. Viola, T. Darrell Fast pose estimation with parameter-sensitive hashing, October 2003.
Big Data in Vision: Scene CompletionGoal: Given an image and a selected region, fill the region with a plausible texture.
J. Hays and A. A. Efros, "Scene completion using millions of photographs," in SIGGRAPH '07: ACM SIGGRAPH 2007 papers. New York, NY, USA: ACM, 2007, pp. 4+.
Big Data in Vision: IM2GPSGoal: Given an image, guess where in the world it was taken.
J. Hays and A. A. Efros, "Im2gps: estimating geographic information from a single image," Computer Vision and Pattern Recognition, IEEE Computer Society Conference on, vol. 0, pp. 1-8, 2008.
Big Data in Vision: Object RecognitionGoal: Given an image, select a noun that describes it.
A. Torralba, R. Fergus, and W. T. Freeman, "80 million tiny images: A large data set for nonparametric object and scene recognition," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 30, no. 11, pp. 1958-1970, May 2008
Big Data in Vision: Pixel AnnotationGoal: Given an image, annotate every pixel (e.g., building).
C. Liu, J. Yuen, and A. Torralba, "Nonparametric scene parsing: Label transfer via dense scene alignment," Computer Vision and Pattern Recognition, IEEE Computer Society Conference on, vol. 0, pp. 1972-1979, 2009.
Big Data in Vision: One Frame MotionGoal: Given an image, estimate the pixel motion.
C. Liu, J. Yuen, A. Torralba, J. Sivic, and W. T. Freeman, "Sift flow: Dense correspondence across different scenes," in ECCV '08: Proceedings of the 10th European Conference on Computer Vision. Berlin, Heidelberg: Springer-Verlag, 2008, pp. 28-42.
Outline
'Big Data' in Computer VisionMap/Reduce and Computer VisionMap/Reduce Image SearchApplication: Screenshot Retrieval
Hadoop+CV: No Reducer
Example MapsObject Detection (e.g., cars, faces)Feature Computation (e.g., SIFT)Sliding Windows (given a region+image)
Map Map Map
Hadoop+CV: Model Creation
Map: Feature ComputationRed: Model CreationExamples
Classifiers (e.g., SVM, Bayes)Geometry Problems (e.g., RANSAC, SfM)
Reduce
Map Map Map
Hadoop+CV: Expectation Maximization
Map: Fit data to model given parameters (E-Step)Red: Compute new model parameters given data (M-Step)Iterate until stopping conditions are met.Examples
Clustering (e.g., K-Means)Mixture Models (e.g., MoG)
Vec0 Vec1 Vec2
Map Map MapParameter Estimate (in JAR or cache)
Reduce
Outline
'Big Data' in Computer VisionMap/Reduce and Computer VisionMap/Reduce Image SearchApplication: Screenshot Retrieval
Image Retrieval with HadoopAnalogies between image and text retrieval
Bag of Words -> Bag of FeaturesDocument -> ImageVisual Word: Cluster of similar visual features
Compute Local Image Features (e.g., SIFT)Cluster Features (i.e., create visual words)Find cluster mediansMake Hamming Embeddings (compact feature) [1]
Efficient binary code (256 -> 8 Bytes per feature)Hamming DistanceBenefit: Small size means more in memory
Inverted Index[1] H. Jegou, M. Douze, and C. Schmid, "Hamming embedding and weak geometric consistency for large scale image search," in ECCV '08: Proceedings of the 10th European Conference on Computer Vision. Berlin, Heidelberg: Springer-Verlag, 2008, pp. 304-317
Hadoop Job Workflow
Image Features (SURF 64D)
Remove Dupes (Curr./Prev.)
(Database Images)
K-Means Clustering (Initial)
K-Means Clustering
Median Computation
Hamming Embedding
Hadoop Job Workflow: Image Features
Image Features (SURF 64D)
(Database Images)
Map In: (image_url, image_hash, image_data, image_tags)
Map Out: (image_hash, image_url, image_features)
Hadoop Job Workflow: Remove Dupes
Map In: [image_hash, image_url, image_features]orMap In: [image_hash] (for images already in the DB)
Map Out Key: image_hashMap Out Val: image_features
Reduce Out: [image_hash, image_feature]
Image Features (SURF 64D)
Remove Dupes (Curr./Prev.)
Hadoop Job Workflow: K-Means (init)
Map In: [image_hash, image_feature]
Map Out Key: random [0,1]Map Out Val: image_feature (extended by 1 dim to get count)
1 Reducer (outputs once per cluster)Reduce Out: [cluster_num, cluster_mean]
Remove Dupes (Curr./Prev.)
K-Means Clustering (Initial)
Hadoop Job Workflow: K-Means
File: cluster_meansMap In: [image_hash, image_feature]
Map Out Key: cluster_num (nearest cluster)Map Out Val: image_feature (extended by 1 dim to get count)
Reduce Out: [cluster_num, cluster_mean]
K-Means Clustering (Initial)
K-Means Clustering
Hadoop Job Workflow: Medians
File: cluster_meansMap In: [image_hash, image_feature]
Map Out Key: cluster_num (nearest cluster)Map Out Val: image_feature
Reduce Out: [cluster_num, cluster_median]
K-Means Clustering
Median Computation
Hadoop Job Workflow: Ham. Emb.
File: cluster_means, cluster_mediansMap In: [image_hash, image_feature]
Map Out Key: cluster_num (nearest cluster)Map Out Val: hamming_embedding
Reduce Out: [cluster_num, hamming_embedding]
Median Computation
Hamming Embedding
Image Retrieval Overview: Query
Image Features (SURF 64D)
Find Nearest ClusterFor each feature...
(Query Image)
Compute hamming embedding(using cluster median)
Vote (tf-idf) for DB image if a feature if hamming dist < Thresh
Outline
'Big Data' in Computer VisionMap/Reduce and Computer VisionMap/Reduce Image SearchApplication: Screenshot Retrieval
Current Work: PC Help Doc. RetrievalGoal: Take a screenshot and retrieve books and websites that provide relevant help documentation.
Tom Yeh, Brandyn White, Larry Davis, and Boris Katz
Outline
'Big Data' in Computer VisionMap/Reduce and Computer VisionMap/Reduce Image SearchApplication: Screenshot Retrieval
Conclusion
Vision has 'Big Data' applicationsMany image search applicationsCommon design patterns for M/R+VisionHadoop useful image search
References
[1] P. Duygulu, K. Barnard, J. de Freitas, and D. Forsyth, "Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary," in Computer Vision — ECCV 2002, ser. Lecture Notes in Computer Science, 2002, ch. 7, pp. 349-354.[2] A. Makadia, V. Pavlovic, and S. Kumar, "A new baseline for image annotation," in ECCV '08: Proceedings of the 10th European Conference on Computer Vision. Berlin, Heidelberg: Springer-Verlag, 2008, pp. 316-329.[3] Matthieu Guillaumin, Thomas Mensink, Jakob Verbeek and Cordelia Schmid, "Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation." ICCV 2009[4] A. Torralba, R. Fergus, and W. T. Freeman, "80 million tiny images: A large data set for nonparametric object and scene recognition," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 30, no. 11, pp. 1958-1970, May 2008.