Features VBM686- Bilgisayarli Goru Pinar Duygulu Hacettepe University
Features
VBM686- Bilgisayarli Goru
Pinar Duygulu
Hacettepe University
Digital Color Images
CMOS sensor
Bayer Filter
Slide credit: Derek Hoiem
Color Image R
G
B
Slide credit: Derek Hoiem
Images in Matlab• Images represented as a matrix• Suppose we have a NxM RGB image called “im”
– im(1,1,1) = top-left pixel value in R-channel– im(y, x, b) = y pixels down, x pixels to right in the bth channel– im(N, M, 3) = bottom-right pixel in B-channel
• imread(filename) returns a uint8 image (values 0 to 255)– Convert to double format (values 0 to 1) with im2double
0.92 0.93 0.94 0.97 0.62 0.37 0.85 0.97 0.93 0.92 0.99
0.95 0.89 0.82 0.89 0.56 0.31 0.75 0.92 0.81 0.95 0.91
0.89 0.72 0.51 0.55 0.51 0.42 0.57 0.41 0.49 0.91 0.92
0.96 0.95 0.88 0.94 0.56 0.46 0.91 0.87 0.90 0.97 0.95
0.71 0.81 0.81 0.87 0.57 0.37 0.80 0.88 0.89 0.79 0.85
0.49 0.62 0.60 0.58 0.50 0.60 0.58 0.50 0.61 0.45 0.33
0.86 0.84 0.74 0.58 0.51 0.39 0.73 0.92 0.91 0.49 0.74
0.96 0.67 0.54 0.85 0.48 0.37 0.88 0.90 0.94 0.82 0.93
0.69 0.49 0.56 0.66 0.43 0.42 0.77 0.73 0.71 0.90 0.99
0.79 0.73 0.90 0.67 0.33 0.61 0.69 0.79 0.73 0.93 0.97
0.91 0.94 0.89 0.49 0.41 0.78 0.78 0.77 0.89 0.99 0.93
0.92 0.93 0.94 0.97 0.62 0.37 0.85 0.97 0.93 0.92 0.99
0.95 0.89 0.82 0.89 0.56 0.31 0.75 0.92 0.81 0.95 0.91
0.89 0.72 0.51 0.55 0.51 0.42 0.57 0.41 0.49 0.91 0.92
0.96 0.95 0.88 0.94 0.56 0.46 0.91 0.87 0.90 0.97 0.95
0.71 0.81 0.81 0.87 0.57 0.37 0.80 0.88 0.89 0.79 0.85
0.49 0.62 0.60 0.58 0.50 0.60 0.58 0.50 0.61 0.45 0.33
0.86 0.84 0.74 0.58 0.51 0.39 0.73 0.92 0.91 0.49 0.74
0.96 0.67 0.54 0.85 0.48 0.37 0.88 0.90 0.94 0.82 0.93
0.69 0.49 0.56 0.66 0.43 0.42 0.77 0.73 0.71 0.90 0.99
0.79 0.73 0.90 0.67 0.33 0.61 0.69 0.79 0.73 0.93 0.97
0.91 0.94 0.89 0.49 0.41 0.78 0.78 0.77 0.89 0.99 0.93
0.92 0.93 0.94 0.97 0.62 0.37 0.85 0.97 0.93 0.92 0.99
0.95 0.89 0.82 0.89 0.56 0.31 0.75 0.92 0.81 0.95 0.91
0.89 0.72 0.51 0.55 0.51 0.42 0.57 0.41 0.49 0.91 0.92
0.96 0.95 0.88 0.94 0.56 0.46 0.91 0.87 0.90 0.97 0.95
0.71 0.81 0.81 0.87 0.57 0.37 0.80 0.88 0.89 0.79 0.85
0.49 0.62 0.60 0.58 0.50 0.60 0.58 0.50 0.61 0.45 0.33
0.86 0.84 0.74 0.58 0.51 0.39 0.73 0.92 0.91 0.49 0.74
0.96 0.67 0.54 0.85 0.48 0.37 0.88 0.90 0.94 0.82 0.93
0.69 0.49 0.56 0.66 0.43 0.42 0.77 0.73 0.71 0.90 0.99
0.79 0.73 0.90 0.67 0.33 0.61 0.69 0.79 0.73 0.93 0.97
0.91 0.94 0.89 0.49 0.41 0.78 0.78 0.77 0.89 0.99 0.93
R
G
B
rowcolumn
Slide credit: Derek Hoiem
Color spaces
• How can we represent color?
http://en.wikipedia.org/wiki/File:RGB_illumination.jpgSlide credit: Derek Hoiem
Color spaces: RGB
0,1,0
0,0,1
1,0,0
Image from: http://en.wikipedia.org/wiki/File:RGB_color_solid_cube.png
Some drawbacks• Strongly correlated channels• Non-perceptual
Default color space
R(G=0,B=0)
G(R=0,B=0)
B(R=0,G=0)
Slide credit: Derek Hoiem
Color spaces: HSVIntuitive color space
H(S=1,V=1)
S(H=1,V=1)
V(H=1,S=0)
Slide credit: Derek Hoiem
Color spaces: YCbCr
Y(Cb=0.5,Cr=0.5)
Cb(Y=0.5,Cr=0.5)
Cr(Y=0.5,Cb=05)
Y=0 Y=0.5
Y=1Cb
Cr
Fast to compute, good for compression, used by TV
Slide credit: Derek Hoiem
Color spaces: CIE L*a*b*“Perceptually uniform” color space
L(a=0,b=0)
a(L=65,b=0)
b(L=65,a=0)
Luminance = brightnessChrominance = colorSlide credit: Derek Hoiem
A qualitative rendering of the CIE
(x,y) space. The blobby region
represents visible colors. There are
sets of (x, y) coordinates that don’t
represent real colors, because the
primaries are not real lights (so that
the color matching functions could
be positive everywhere).
Adapted from David Forsyth, UC Berkeley
hue is a "pure" colour, i.e. one with no black or white in it.
Variations in color matches on a CIE x, y space. At the center of the ellipse is the color of a
test light; the size of the ellipse represents the scatter of lights that the human observers tested
would match to the test color; the boundary shows where the just noticeable difference is.
The ellipses on the left have been magnified 10x for clarity; on the right they are plotted to
scale. The ellipses are known as MacAdam ellipses after their inventor. The ellipses at the top
are larger than those at the bottom of the figure, and that they rotate as they move up. This
means that the magnitude of the difference in x, y coordinates is a poor guide to the
difference in color.
CIE u’v’ which is a
projective transform
of x, y. We transform
x,y so that ellipses are
most like one another.
Figure shows the
transformed ellipses.
Adapted from David Forsyth, UC Berkeley
Which contains more information?(a) intensity (1 channel)
(b) chrominance (2 channels)
Slide credit: Derek Hoiem
Most information in intensity
Only color shown – constant intensity
Slide credit: Derek Hoiem
Most information in intensity
Only intensity shown – constant colorSlide credit: Derek Hoiem
Most information in intensity
Original imageSlide credit: Derek Hoiem
CS554 Computer Vision ©Pinar Duygulu
Color Space Transformations
• Why
– To print (RGB CMYK or Greyscale)
– To compress images (RGB YUV)
• Color information (U,V) can be compressed 4 times without significant degradation in perceptual quality)
– To compare images (RGB CIELAB)
• CIELAB space is more perceptually uniform
• Euclidean distance in LAB space hence meaningful
• e.g. Photoshop operations
CS554 Computer Vision ©Pinar Duygulu
Color Channels
Example: learning skin colors
• We can represent a class-conditional density using a histogram (a “non-parametric” distribution)
Feature x = Hue
P(x|skin)
Feature x = Hue
P(x|not skin)
Percentage of skin pixels in each bin
Kristen Grauman
Example: learning skin colors
• We can represent a class-conditional density using a histogram (a “non-parametric” distribution)
Feature x = Hue
P(x|skin)
Feature x = Hue
P(x|not skin)Now we get a new image, and want to label each pixel as skin or non-skin.
What’s the probability we care about to do skin detection?
Kristen Grauman
Bayes rule
)(
)()|()|(
xP
skinPskinxPxskinP
posterior priorlikelihood
)()|( )|( skinPskinxPxskinP
Example: classifying skin pixels
Now for every pixel in a new image, we can estimate probability that it is generated by skin.
Classify pixels based on these probabilities
Brighter pixels higher probability of being skin
Kristen Grauman
Example: classifying skin pixels
Gary Bradski, 1998Kristen Grauman
Gary Bradski, 1998
Example: classifying skin pixels
Using skin color-based face detection and pose estimation as a video-based interface
Kristen Grauman
Simple holistic descriptions of image content
grayscale / color histogram
vector of pixel intensities
Window-based models
Building an object model
Kristen Grauman
Window-based models
Building an object model
• Pixel-based representations sensitive to small shifts
• Color or grayscale-based appearance description can be
sensitive to illumination and intra-class appearance
variation
Kristen Grauman
Window-based models
Building an object model
• Consider edges, contours, and (oriented) intensity
gradients
Kristen Grauman
Window-based models
Building an object model
• Consider edges, contours, and (oriented) intensity
gradients
• Summarize local distribution of gradients with histogram
Locally orderless: offers invariance to small shifts and rotations
Contrast-normalization: try to correct for variable illumination
Kristen Grauman
Revisit Texture• Texture depicts spatially repeating patterns
• Many natural phenomena are textures
radishes rocks yogurt
Alyosha Efros, CMU
Texton Discrimination (Julesz)
Human vision is sensitive to the difference of some types of elements and
appears to be “numb” on other types of differences.
Alyosha Efros, CMU
Search Experiment I
The subject is told to detect a target element in a number of background elements.
In this example, the detection time is independent of the number of background elements.
Alyosha Efros, CMU
Search Experiment II
In this example, the detection time is proportional to the number of background elements,
And thus suggests that the subject is doing element-by-element scrutiny.
Alyosha Efros, CMU
Heuristic (Axiom) IJulesz then conjectured the following axiom:
Human vision operates in two distinct modes:
1. Preattentive vision
parallel, instantaneous (~100--200ms), without scrutiny,
independent of the number of patterns, covering a large visual field.
2. Attentive vision
serial search by focal attention in 50ms steps limited to small aperture.
Then what are the basic elements?
Alyosha Efros, CMU
Heuristic (Axiom) IIJulesz’s second heuristic answers this question:
Textons are the fundamental elements in preattentive vision, including
1. Elongated blobs
rectangles, ellipses, line segments with attributes
color, orientation, width, length, flicker rate.
2. Terminators
ends of line segments.
3. Crossings of line segments.
But it is worth noting that Julesz’s conclusions are largely based by ensemble of
artificial texture patterns. It was infeasible to synthesize natural textures for
controlled experiments at that time.
Alyosha Efros, CMU
Textons
Malik, Belongie, Shi, Leung, 1999
Filter bank
Vector of filter responses
at each pixel
Kmeans over a set ofvectors on a collectionof images
Bag of words
Spatially organized textures
Bag of words model
65 17 23 36
7 8 0 0
20 0 0 0
3 0 12 4
0 2 0 0
11 1 0 2
0 0 4 16
7 0 4 0
14 0 3 3
3 6 0 11
Torralba, MIT
Bag of words &spatial pyramid matching
Grauman & Darell, S. Lazebnik, et al, CVPR 2006
Torralba, MIT
Histogram IntersectionHistogram intersection
Slide credit: Kristen Grauman
Histogram based distancesGiven two histograms: h1, h2, such that sum(h1)=sum(h2)=1
• Euclidean
D(h1, h2) = sum ((h1 – h2).^2)
• Histogram intersection
D(h1, h2) = 1-sum (min (h1, h2))
• X2
D(h1, h2) = sum((h1-h2).^2 ./ (h1+h2))
(using Matlab notation)Torralba, MIT
Capturing the “essence” of texture
• …for real images
• We don’t want an actual texture realization, we want a texture invariant
• What are the tools for capturing statisticalproperties of some signal?
Alyosha Efros, CMU
Multi-scale filter decomposition
Filter bank
Input image
Alyosha Efros, CMU
Filter response histograms
Alyosha Efros, CMU
Textons (Malik et al, IJCV 2001)
• K-means on vectors of filter responses
Textons (cont.)
Varma, M. and Zisserman, A., IJCV 2005
Varma, M. and Zisserman, A., IJCV 2005
Textons
Walker, Malik, 2004Torralba, MIT
Dalal & Triggs, CVPR 2005
• Map each grid cell in the
input window to a histogram
counting the gradients per
orientation.
• Train a linear SVM using
training set of pedestrian vs.
non-pedestrian windows.
Code available:
http://pascal.inrialpes.fr/soft/olt/
Person detection
with HoG’s & linear SVM’s
Person detection
with HoG’s & linear SVM’s
• Histograms of Oriented Gradients for Human Detection, Navneet Dalal, Bill Triggs,
International Conference on Computer Vision & Pattern Recognition - June 2005
• http://lear.inrialpes.fr/pubs/2005/DT05/
Histograms of oriented gradients
Shape context
Belongie, Malik, Puzicha, NIPS 2000SIFT, D. Lowe, ICCV 1999
Image features:
Bin gradients from 8x8 pixel neighborhoods into 9
orientations
(Dalal & Triggs CVPR 05)
Histograms of oriented gradients (HOG)
Source: Deva Ramanan
Why local features?