Page 1
Visual Data on the Internet
With slides from Alexei Efros, James Hays, Antonio Torralba, and Frederic Heger
15-463: Computational PhotographyJean-Francois Lalonde, CMU, Fall 2012
Visualization of 53,464 english nouns, credit: A. Torralba, http://groups.csail.mit.edu/vision/TinyImages/
Page 2
The Art of Cassandra Jones
http://www.youtube.com/watch?v=5H7WrIBrDRg
Page 3
Big Issues
What is out there on the Internet?
How do we get it?
What can we do with it?
Page 4
Subject-specific Data
Photos of Coliseum (Snavely et al.)
Portraits of Bill Clinton
Page 5
Much of Captured World is “Generic”
Page 6
Generic Data
pedestriansfaces
street scenes Food plates
Page 7
The Internet as a Data Source
Page 8
How big is Flickr?
Credit: Franck_Michel (http://www.flickr.com/photos/franckmichel/)
100M photos updated daily6B photos as of August 2011!
• ~3B public photos
Page 9
How Annotated is Flickr? (tag search)
Party – 23,416,126
Paris – 11,163,625
Pittsburgh – 1,152,829
Chair – 1,893,203
Violin – 233,661
Trashcan – 31,200
Page 10
“Trashcan” Results
• http://www.flickr.com/search/?q=trashcan+NOT+party&m=tags&z=t&page=5
Page 11
Big Issues
What is out there on the Internet?
How do we get it?
What can we do with it?• Let’s see a motivating example...
Page 12
[Hays and Efros. Scene Completion Using Millions of Photographs. SIGGRAPH 2007 and CACM October 2008.]
Page 15
Efros and Leung result
Page 17
Scene Matching for Image Completion
Page 18
Scene Completion Result
Page 22
Scene Descriptor
Scene Gist Descriptor (Oliva and Torralba 2001)
Page 23
Scene Descriptor
+
Scene Gist Descriptor (Oliva and Torralba 2001)
Page 24
2 Million Flickr Images
Page 27
Graph cut + Poisson blending
Page 28
Result Ranking
We assign each of the 200 results a score which is the sum of:
The scene matching distance
The context matching distance (color + texture)
The graph cut cost
Page 36
… 200 scene matches
Page 44
Nearest neighbors from acollection of 20 thousand images
Page 45
Nearest neighbors from acollection of 2 million images
Page 46
“Unreasonable Effectiveness of Data”
Parts of our world can be explained by elegant mathematics• physics, chemistry, astronomy, etc.
But much cannot• psychology, economics, genetics, etc.
Enter The Data!• Great advances in several fields:
– e.g. speech recognition, machine translation– Case study: Google
[Halevy, Norvig, Pereira 2009]
Page 47
A.I. for the postmodern world:• all questions have already been answered…many times, in
many ways
• Google is dumb, the “intelligence” is in the data
Page 48
How about visual data?
Text is simple: • clean, segmented, compact, 1D
Visual data is much harder:• Noisy, unsegmented, high entropy, 2D/3D
Quick Overview• Comparing Images
• Uses of Visual Data
• The Dangers of Data
Page 49
Distance Metrics
-
-
-
= Euclidian distance of 5 units
= Gray value distance of 50 values
= ?
x
y
x
y
Page 50
SSD says these are not similar
?
Page 51
Tiny Images
A. Torralba, R. Fergus, and W. T. Freeman, “80 million tiny images: a large dataset for non-parametric object and scene recognition,” PAMI, 2008.
Page 52
Image Segmentation (by humans)
Page 53
Image Segmentation (by humans)
Page 54
Human Scene Recognition
Page 55
Tiny Images Project Page
http://groups.csail.mit.edu/vision/TinyImages/
Page 56
Powers of 10Number of images on my hard drive: 104
Number of images seen during my first 10 years: 108 (3 images/second * 60 * 60 * 16 * 365 * 10 = 630720000)
Number of images seen by all humanity: 1020
106,456,367,669 humans1 * 60 years * 3 images/second * 60 * 60 * 16 * 365 = 1 from http://www.prb.org/Articles/2002/HowManyPeopleHaveEverLivedonEarth.aspx
Number of photons in the universe: 1088
Number of all 8-bits 32x32 images: 107373
256 32*32*3 ~ 107373
Page 57
Scenes are unique
Page 58
But not all scenes are so original
Page 59
But not all scenes are so original
Page 60
A. Torralba, R. Fergus, W.T.Freeman. PAMI 2008
Page 61
A. Torralba, R. Fergus, W.T.Freeman. PAMI 2008
Page 63
Automatic Colorization Result
Grayscale input High resolution
Colorization of input using average
A. Torralba, R. Fergus, W.T.Freeman. 2008
Page 64
Automatic Orientation
Many images have ambiguous orientation
Look at top 25% by confidence • correlation score
Examples of high and low confidence images
Page 65
Automatic Orientation Examples
A. Torralba, R. Fergus, W.T.Freeman. 2008
Page 66
Tiny Images Discussion
Why SSD?
Can we build a better image descriptor?
Page 67
Space Shuttle Cargo Bay
Image Representations: Histograms
global histogram• Represent distribution of features
• Color, texture, depth, …
Images from Dave Kauchak
Page 68
Image Representations: Histograms
Joint histogram• Requires lots of data
• Loss of resolution to avoid empty bins
Images from Dave Kauchak
Marginal histogram• Requires independent features• More data/bin than
joint histogram
Page 69
Space Shuttle Cargo Bay
Image Representations: Histograms
Adaptive binning• Better data/bin distribution, fewer empty bins
• Can adapt available resolution to relative feature importance
Images from Dave Kauchak
Page 70
EASE Truss
Assembly
Space Shuttle Cargo Bay
Image Representations: Histograms
Clusters / Signatures• “super-adaptive” binning
• Does not require discretization along any fixed axis
Images from Dave Kauchak
Page 71
Issue: How to Compare Histograms?
Bin-by-bin comparison
Sensitive to bin size.
Could use wider bins …… but at a loss of resolution
Cross-bin comparisonHow much cross-bin influence is necessary/sufficient?
Page 72
Red Car Retrievals (Color histograms)
Histogram matching distance
Page 73
Capturing the “essence” of texture
…for real images
We don’t want an actual texture realization, we want a texture invariant
What are the tools for capturing statistical properties of some signal?
Page 74
Multi-scale filter decomposition
Filter bank
Input image
Page 75
Filter response histograms
Page 76
Heeger & Bergen ‘95
Start with a noise image as output
Main loop:• Match pixel histogram of output image to
input
• Decompose input and output images using multi-scale filter bank (Steerable Pyramid)
• Match sub-band histograms of input and output pyramids
• Reconstruct input and output images (collapse the pyramids)
Page 77
Image Descriptors
• Blur + SSD• Color / Texture histograms• Gradients + Histogram (GIST, SIFT, HOG,
etc)• “Bag of Visual Words”