Large Scale Image Processing

Large Scale Image Processing with Hadoop

Brandyn [email protected]

Advisor: Prof. Larry Davis

Outline

'Big Data' in Computer VisionMap/Reduce and Computer VisionMap/Reduce Image SearchApplication: Screenshot Retrieval

'Big Data' in Vision

Traditional Vision: Focus on the modelPose Est.: 2D Image -> Virtual 3D model + Camera

Under-constrained, slow, sensitive to noiseObject Recognition: SVM + features

Breaks with many classes (e.g., every flickr tag)

New Trend: Focus on the dataDB of images (w/ metadata) -> query imageProblem becomes similar image searchTransfer metadata from DB images to query imageKNN methods simple and scalable

Clustering, hashing, metric learning

NLP: rule-based models -> statistical models

Example: Image Search -> MetadataQuery Image

Example: Image Search -> MetadataQuery Image Retrieved Images (flickr)

TagsLocation (GPS)TitleDateGroupsCommentsOwnerViews



Example: Image Search -> MetadataQuery Image Retrieved Images (flickr)




Output Metadata

TagsLocation (GPS)

Big Data in Vision: Pose EstimationGoal: Given an image of a person, estimate 3D pose.

G. Shakhnarovich, P. Viola, T. Darrell Fast pose estimation with parameter-sensitive hashing, October 2003.

Big Data in Vision: Scene CompletionGoal: Given an image and a selected region, fill the region with a plausible texture.

J. Hays and A. A. Efros, "Scene completion using millions of photographs," in SIGGRAPH '07: ACM SIGGRAPH 2007 papers. New York, NY, USA: ACM, 2007, pp. 4+.

Big Data in Vision: IM2GPSGoal: Given an image, guess where in the world it was taken.

J. Hays and A. A. Efros, "Im2gps: estimating geographic information from a single image," Computer Vision and Pattern Recognition, IEEE Computer Society Conference on, vol. 0, pp. 1-8, 2008.

Big Data in Vision: Object RecognitionGoal: Given an image, select a noun that describes it.

A. Torralba, R. Fergus, and W. T. Freeman, "80 million tiny images: A large data set for nonparametric object and scene recognition," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 30, no. 11, pp. 1958-1970, May 2008

Big Data in Vision: Pixel AnnotationGoal: Given an image, annotate every pixel (e.g., building).

C. Liu, J. Yuen, and A. Torralba, "Nonparametric scene parsing: Label transfer via dense scene alignment," Computer Vision and Pattern Recognition, IEEE Computer Society Conference on, vol. 0, pp. 1972-1979, 2009.

Big Data in Vision: One Frame MotionGoal: Given an image, estimate the pixel motion.

C. Liu, J. Yuen, A. Torralba, J. Sivic, and W. T. Freeman, "Sift flow: Dense correspondence across different scenes," in ECCV '08: Proceedings of the 10th European Conference on Computer Vision. Berlin, Heidelberg: Springer-Verlag, 2008, pp. 28-42.

Outline


Hadoop+CV: No Reducer

Example MapsObject Detection (e.g., cars, faces)Feature Computation (e.g., SIFT)Sliding Windows (given a region+image)

Map Map Map

Hadoop+CV: Model Creation

Map: Feature ComputationRed: Model CreationExamples

Classifiers (e.g., SVM, Bayes)Geometry Problems (e.g., RANSAC, SfM)

Reduce

Map Map Map

Hadoop+CV: Expectation Maximization

Map: Fit data to model given parameters (E-Step)Red: Compute new model parameters given data (M-Step)Iterate until stopping conditions are met.Examples

Clustering (e.g., K-Means)Mixture Models (e.g., MoG)

Vec0 Vec1 Vec2

Map Map MapParameter Estimate (in JAR or cache)

Reduce

Outline


Image Retrieval with HadoopAnalogies between image and text retrieval

Bag of Words -> Bag of FeaturesDocument -> ImageVisual Word: Cluster of similar visual features

Compute Local Image Features (e.g., SIFT)Cluster Features (i.e., create visual words)Find cluster mediansMake Hamming Embeddings (compact feature) [1]

Efficient binary code (256 -> 8 Bytes per feature)Hamming DistanceBenefit: Small size means more in memory

Inverted Index[1] H. Jegou, M. Douze, and C. Schmid, "Hamming embedding and weak geometric consistency for large scale image search," in ECCV '08: Proceedings of the 10th European Conference on Computer Vision. Berlin, Heidelberg: Springer-Verlag, 2008, pp. 304-317

Hadoop Job Workflow

Image Features (SURF 64D)

Remove Dupes (Curr./Prev.)

(Database Images)

K-Means Clustering (Initial)

K-Means Clustering

Median Computation

Hamming Embedding

Hadoop Job Workflow: Image Features


(Database Images)

Map In: (image_url, image_hash, image_data, image_tags)

Map Out: (image_hash, image_url, image_features)

Hadoop Job Workflow: Remove Dupes

Map In: [image_hash, image_url, image_features]orMap In: [image_hash] (for images already in the DB)

Map Out Key: image_hashMap Out Val: image_features

Reduce Out: [image_hash, image_feature]



Hadoop Job Workflow: K-Means (init)

Map In: [image_hash, image_feature]

Map Out Key: random [0,1]Map Out Val: image_feature (extended by 1 dim to get count)

1 Reducer (outputs once per cluster)Reduce Out: [cluster_num, cluster_mean]



Hadoop Job Workflow: K-Means

File: cluster_meansMap In: [image_hash, image_feature]

Map Out Key: cluster_num (nearest cluster)Map Out Val: image_feature (extended by 1 dim to get count)

Reduce Out: [cluster_num, cluster_mean]


K-Means Clustering

Hadoop Job Workflow: Medians

File: cluster_meansMap In: [image_hash, image_feature]

Map Out Key: cluster_num (nearest cluster)Map Out Val: image_feature

Reduce Out: [cluster_num, cluster_median]

K-Means Clustering

Median Computation

Hadoop Job Workflow: Ham. Emb.

File: cluster_means, cluster_mediansMap In: [image_hash, image_feature]

Map Out Key: cluster_num (nearest cluster)Map Out Val: hamming_embedding

Reduce Out: [cluster_num, hamming_embedding]

Median Computation

Hamming Embedding

Image Retrieval Overview: Query


Find Nearest ClusterFor each feature...

(Query Image)

Compute hamming embedding(using cluster median)

Vote (tf-idf) for DB image if a feature if hamming dist < Thresh

Outline


Current Work: PC Help Doc. RetrievalGoal: Take a screenshot and retrieve books and websites that provide relevant help documentation.

Tom Yeh, Brandyn White, Larry Davis, and Boris Katz

Outline


Conclusion

Vision has 'Big Data' applicationsMany image search applicationsCommon design patterns for M/R+VisionHadoop useful image search

References

[1] P. Duygulu, K. Barnard, J. de Freitas, and D. Forsyth, "Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary," in Computer Vision — ECCV 2002, ser. Lecture Notes in Computer Science, 2002, ch. 7, pp. 349-354.[2] A. Makadia, V. Pavlovic, and S. Kumar, "A new baseline for image annotation," in ECCV '08: Proceedings of the 10th European Conference on Computer Vision. Berlin, Heidelberg: Springer-Verlag, 2008, pp. 316-329.[3] Matthieu Guillaumin, Thomas Mensink, Jakob Verbeek and Cordelia Schmid, "Tagprop: Discriminative metric learning in nearest neighbor models for image auto-annotation." ICCV 2009[4] A. Torralba, R. Fergus, and W. T. Freeman, "80 million tiny images: A large data set for nonparametric object and scene recognition," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 30, no. 11, pp. 1958-1970, May 2008.

http://www.cs.rutgers.edu/~vladimir/pub/makadia08eccv.pdf

goog_1266430945660

goog_1266430945660

http://www.cs.rutgers.edu/~vladimir/pub/makadia08eccv.pdf

http://lear.inrialpes.fr/pubs/2009/GMVS09/GMVS09.pdf

http://lear.inrialpes.fr/pubs/2009/GMVS09/GMVS09.pdf

goog_1266497235565

goog_1266497235565

goog_1266497235565

goog_1266497235565

http://people.csail.mit.edu/torralba/tmp/tiny.pdf

Large Scale Image Processing

Documents