Edge and Texture - Hacettepe Üniversitesipinar/courses/VBM686/... · 2018. 10. 30. · Color Space Transformations •Why –To print (RGB CMYK or Greyscale) –To compress images

Features

VBM686- Bilgisayarli Goru

Pinar Duygulu

Hacettepe University

Digital Color Images

CMOS sensor

Bayer Filter

Slide credit: Derek Hoiem

Color Image R

G

B


Images in Matlab• Images represented as a matrix• Suppose we have a NxM RGB image called “im”

– im(1,1,1) = top-left pixel value in R-channel– im(y, x, b) = y pixels down, x pixels to right in the bth channel– im(N, M, 3) = bottom-right pixel in B-channel

• imread(filename) returns a uint8 image (values 0 to 255)– Convert to double format (values 0 to 1) with im2double

0.92 0.93 0.94 0.97 0.62 0.37 0.85 0.97 0.93 0.92 0.99

0.95 0.89 0.82 0.89 0.56 0.31 0.75 0.92 0.81 0.95 0.91

0.89 0.72 0.51 0.55 0.51 0.42 0.57 0.41 0.49 0.91 0.92

0.96 0.95 0.88 0.94 0.56 0.46 0.91 0.87 0.90 0.97 0.95

0.71 0.81 0.81 0.87 0.57 0.37 0.80 0.88 0.89 0.79 0.85

0.49 0.62 0.60 0.58 0.50 0.60 0.58 0.50 0.61 0.45 0.33

0.86 0.84 0.74 0.58 0.51 0.39 0.73 0.92 0.91 0.49 0.74

0.96 0.67 0.54 0.85 0.48 0.37 0.88 0.90 0.94 0.82 0.93

0.69 0.49 0.56 0.66 0.43 0.42 0.77 0.73 0.71 0.90 0.99

0.79 0.73 0.90 0.67 0.33 0.61 0.69 0.79 0.73 0.93 0.97

0.91 0.94 0.89 0.49 0.41 0.78 0.78 0.77 0.89 0.99 0.93

0.92 0.93 0.94 0.97 0.62 0.37 0.85 0.97 0.93 0.92 0.99

0.95 0.89 0.82 0.89 0.56 0.31 0.75 0.92 0.81 0.95 0.91

0.89 0.72 0.51 0.55 0.51 0.42 0.57 0.41 0.49 0.91 0.92

0.96 0.95 0.88 0.94 0.56 0.46 0.91 0.87 0.90 0.97 0.95

0.71 0.81 0.81 0.87 0.57 0.37 0.80 0.88 0.89 0.79 0.85

0.49 0.62 0.60 0.58 0.50 0.60 0.58 0.50 0.61 0.45 0.33

0.86 0.84 0.74 0.58 0.51 0.39 0.73 0.92 0.91 0.49 0.74

0.96 0.67 0.54 0.85 0.48 0.37 0.88 0.90 0.94 0.82 0.93

0.69 0.49 0.56 0.66 0.43 0.42 0.77 0.73 0.71 0.90 0.99

0.79 0.73 0.90 0.67 0.33 0.61 0.69 0.79 0.73 0.93 0.97

0.91 0.94 0.89 0.49 0.41 0.78 0.78 0.77 0.89 0.99 0.93

0.92 0.93 0.94 0.97 0.62 0.37 0.85 0.97 0.93 0.92 0.99

0.95 0.89 0.82 0.89 0.56 0.31 0.75 0.92 0.81 0.95 0.91

0.89 0.72 0.51 0.55 0.51 0.42 0.57 0.41 0.49 0.91 0.92

0.96 0.95 0.88 0.94 0.56 0.46 0.91 0.87 0.90 0.97 0.95

0.71 0.81 0.81 0.87 0.57 0.37 0.80 0.88 0.89 0.79 0.85

0.49 0.62 0.60 0.58 0.50 0.60 0.58 0.50 0.61 0.45 0.33

0.86 0.84 0.74 0.58 0.51 0.39 0.73 0.92 0.91 0.49 0.74

0.96 0.67 0.54 0.85 0.48 0.37 0.88 0.90 0.94 0.82 0.93

0.69 0.49 0.56 0.66 0.43 0.42 0.77 0.73 0.71 0.90 0.99

0.79 0.73 0.90 0.67 0.33 0.61 0.69 0.79 0.73 0.93 0.97

0.91 0.94 0.89 0.49 0.41 0.78 0.78 0.77 0.89 0.99 0.93

R

G

B

rowcolumn


Color spaces

• How can we represent color?

http://en.wikipedia.org/wiki/File:RGB_illumination.jpgSlide credit: Derek Hoiem

Color spaces: RGB

0,1,0

0,0,1

1,0,0

Image from: http://en.wikipedia.org/wiki/File:RGB_color_solid_cube.png

Some drawbacks• Strongly correlated channels• Non-perceptual

Default color space

R(G=0,B=0)

G(R=0,B=0)

B(R=0,G=0)


Color spaces: HSVIntuitive color space

H(S=1,V=1)

S(H=1,V=1)

V(H=1,S=0)


Color spaces: YCbCr

Y(Cb=0.5,Cr=0.5)

Cb(Y=0.5,Cr=0.5)

Cr(Y=0.5,Cb=05)

Y=0 Y=0.5

Y=1Cb

Cr

Fast to compute, good for compression, used by TV


Color spaces: CIE L*a*b*“Perceptually uniform” color space

L(a=0,b=0)

a(L=65,b=0)

b(L=65,a=0)

Luminance = brightnessChrominance = colorSlide credit: Derek Hoiem

A qualitative rendering of the CIE

(x,y) space. The blobby region

represents visible colors. There are

sets of (x, y) coordinates that don’t

represent real colors, because the

primaries are not real lights (so that

the color matching functions could

be positive everywhere).

Adapted from David Forsyth, UC Berkeley

hue is a "pure" colour, i.e. one with no black or white in it.

Variations in color matches on a CIE x, y space. At the center of the ellipse is the color of a

test light; the size of the ellipse represents the scatter of lights that the human observers tested

would match to the test color; the boundary shows where the just noticeable difference is.

The ellipses on the left have been magnified 10x for clarity; on the right they are plotted to

scale. The ellipses are known as MacAdam ellipses after their inventor. The ellipses at the top

are larger than those at the bottom of the figure, and that they rotate as they move up. This

means that the magnitude of the difference in x, y coordinates is a poor guide to the

difference in color.

CIE u’v’ which is a

projective transform

of x, y. We transform

x,y so that ellipses are

most like one another.

Figure shows the

transformed ellipses.

Adapted from David Forsyth, UC Berkeley

Which contains more information?(a) intensity (1 channel)

(b) chrominance (2 channels)


Most information in intensity

Only color shown – constant intensity



Only intensity shown – constant colorSlide credit: Derek Hoiem


Original imageSlide credit: Derek Hoiem

CS554 Computer Vision ©Pinar Duygulu

Color Space Transformations

• Why

– To print (RGB CMYK or Greyscale)

– To compress images (RGB YUV)

• Color information (U,V) can be compressed 4 times without significant degradation in perceptual quality)

– To compare images (RGB CIELAB)

• CIELAB space is more perceptually uniform

• Euclidean distance in LAB space hence meaningful

• e.g. Photoshop operations

CS554 Computer Vision ©Pinar Duygulu

Color Channels

Example: learning skin colors

• We can represent a class-conditional density using a histogram (a “non-parametric” distribution)

Feature x = Hue

P(x|skin)

Feature x = Hue

P(x|not skin)

Percentage of skin pixels in each bin

Kristen Grauman

Example: learning skin colors

• We can represent a class-conditional density using a histogram (a “non-parametric” distribution)

Feature x = Hue

P(x|skin)

Feature x = Hue

P(x|not skin)Now we get a new image, and want to label each pixel as skin or non-skin.

What’s the probability we care about to do skin detection?

Kristen Grauman

Bayes rule

)(

)()|()|(

xP

skinPskinxPxskinP

posterior priorlikelihood

)()|( )|( skinPskinxPxskinP

Example: classifying skin pixels

Now for every pixel in a new image, we can estimate probability that it is generated by skin.

Classify pixels based on these probabilities

Brighter pixels higher probability of being skin

Kristen Grauman


Gary Bradski, 1998Kristen Grauman

Gary Bradski, 1998


Using skin color-based face detection and pose estimation as a video-based interface

Kristen Grauman

Simple holistic descriptions of image content

grayscale / color histogram

vector of pixel intensities

Window-based models

Building an object model

Kristen Grauman

Window-based models


• Pixel-based representations sensitive to small shifts

• Color or grayscale-based appearance description can be

sensitive to illumination and intra-class appearance

variation

Kristen Grauman

Window-based models


• Consider edges, contours, and (oriented) intensity

gradients

Kristen Grauman

Window-based models


• Consider edges, contours, and (oriented) intensity

gradients

• Summarize local distribution of gradients with histogram

Locally orderless: offers invariance to small shifts and rotations

Contrast-normalization: try to correct for variable illumination

Kristen Grauman

Revisit Texture• Texture depicts spatially repeating patterns

• Many natural phenomena are textures

radishes rocks yogurt

Alyosha Efros, CMU

Texton Discrimination (Julesz)

Human vision is sensitive to the difference of some types of elements and

appears to be “numb” on other types of differences.

Alyosha Efros, CMU

Search Experiment I

The subject is told to detect a target element in a number of background elements.

In this example, the detection time is independent of the number of background elements.

Alyosha Efros, CMU

Search Experiment II

In this example, the detection time is proportional to the number of background elements,

And thus suggests that the subject is doing element-by-element scrutiny.

Alyosha Efros, CMU

Heuristic (Axiom) IJulesz then conjectured the following axiom:

Human vision operates in two distinct modes:

1. Preattentive vision

parallel, instantaneous (~100--200ms), without scrutiny,

independent of the number of patterns, covering a large visual field.

2. Attentive vision

serial search by focal attention in 50ms steps limited to small aperture.

Then what are the basic elements?

Alyosha Efros, CMU

Heuristic (Axiom) IIJulesz’s second heuristic answers this question:

Textons are the fundamental elements in preattentive vision, including

1. Elongated blobs

rectangles, ellipses, line segments with attributes

color, orientation, width, length, flicker rate.

2. Terminators

ends of line segments.

3. Crossings of line segments.

But it is worth noting that Julesz’s conclusions are largely based by ensemble of

artificial texture patterns. It was infeasible to synthesize natural textures for

controlled experiments at that time.

Alyosha Efros, CMU

Textons

Malik, Belongie, Shi, Leung, 1999

Filter bank

Vector of filter responses

at each pixel

Kmeans over a set ofvectors on a collectionof images

Bag of words

Spatially organized textures

Bag of words model

65 17 23 36

7 8 0 0

20 0 0 0

3 0 12 4

0 2 0 0

11 1 0 2

0 0 4 16

7 0 4 0

14 0 3 3

3 6 0 11

Torralba, MIT

Bag of words &spatial pyramid matching

Grauman & Darell, S. Lazebnik, et al, CVPR 2006

Torralba, MIT

Histogram IntersectionHistogram intersection

Slide credit: Kristen Grauman

Histogram based distancesGiven two histograms: h1, h2, such that sum(h1)=sum(h2)=1

• Euclidean

D(h1, h2) = sum ((h1 – h2).^2)

• Histogram intersection

D(h1, h2) = 1-sum (min (h1, h2))

• X2

D(h1, h2) = sum((h1-h2).^2 ./ (h1+h2))

(using Matlab notation)Torralba, MIT

Capturing the “essence” of texture

• …for real images

• We don’t want an actual texture realization, we want a texture invariant

• What are the tools for capturing statisticalproperties of some signal?

Alyosha Efros, CMU

Multi-scale filter decomposition

Filter bank

Input image

Alyosha Efros, CMU

Filter response histograms

Alyosha Efros, CMU

Textons (Malik et al, IJCV 2001)

• K-means on vectors of filter responses

Textons (cont.)

Varma, M. and Zisserman, A., IJCV 2005

Varma, M. and Zisserman, A., IJCV 2005

Textons

Walker, Malik, 2004Torralba, MIT

Dalal & Triggs, CVPR 2005

• Map each grid cell in the

input window to a histogram

counting the gradients per

orientation.

• Train a linear SVM using

training set of pedestrian vs.

non-pedestrian windows.

Code available:

http://pascal.inrialpes.fr/soft/olt/

Person detection

with HoG’s & linear SVM’s

Person detection

with HoG’s & linear SVM’s

• Histograms of Oriented Gradients for Human Detection, Navneet Dalal, Bill Triggs,

International Conference on Computer Vision & Pattern Recognition - June 2005

• http://lear.inrialpes.fr/pubs/2005/DT05/

http://lear.inrialpes.fr/people/dalal

http://lear.inrialpes.fr/people/triggs

Histograms of oriented gradients

Shape context

Belongie, Malik, Puzicha, NIPS 2000SIFT, D. Lowe, ICCV 1999

Image features:

Bin gradients from 8x8 pixel neighborhoods into 9

orientations

(Dalal & Triggs CVPR 05)

Histograms of oriented gradients (HOG)

Source: Deva Ramanan

Why local features?

Edge and Texture - Hacettepe Üniversitesipinar/courses/VBM686/... · 2018. 10. 30. · Color Space Transformations •Why –To print (RGB CMYK or Greyscale) –To compress images

Documents