Fundamentals of Image Analysis Alexandre Xavier Falc~ao Institute of Computing - UNICAMP [email protected] Alexandre Xavier Falc~ao MC940/MO445 - Image Analysis
Fundamentals of Image Analysis
Alexandre Xavier Falcao
Institute of Computing - UNICAMP
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Introduction
Image analysis requires to learn models for object detection and/ordelineation (sample extraction), feature extraction (descriptorlearning), and classification (classifier learning).
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Introduction
Image analysis requires to learn models for object detection and/ordelineation (sample extraction), feature extraction (descriptorlearning), and classification (classifier learning).
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Introduction
Image analysis requires to learn models for object detection and/ordelineation (sample extraction), feature extraction (descriptorlearning), and classification (classifier learning).
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Introduction
We will start by taking object detection as an example that usespixels as samples and requires feature extraction and classification.
Object detection evaluates candidate locations for the object(s) ofinterest in the image.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Agenda
Object detection and main concepts for image analysis.
Machine learning and basic concepts from Statistics.
Classic pattern recognition techniques.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Object detection
The important concepts for object detection are
Multiband image, adjacency relation, and multiband kernel.
Convolution between image and kernels (i.e., multiband imagefiltering).
Fast filtering through integral images for feature extraction.
Feature selection using systems of weak classifiers.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Multiband image
A multiband image I is a pair (DI , I), in which
DI ⊂ Zd is the image domain and
I assigns to each space element (spel) p ∈ DI a feature vectorI(p) = (I1(p), I2(p), . . . , In(p)) — i.e., a point in the pixelfeature space <n.
We will focus on d = 2 (spel is pixel) and n ≥ 1.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Multiband image
A multiband image I is a pair (DI , I), in which
DI ⊂ Zd is the image domain and
I assigns to each space element (spel) p ∈ DI a feature vectorI(p) = (I1(p), I2(p), . . . , In(p)) — i.e., a point in the pixelfeature space <n.
We will focus on d = 2 (spel is pixel) and n ≥ 1.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Example of a 2D grayscale image
We will use I = I for binary and grayscale images (n = 1).
The pixel coordinates p = (xp, yp) ∈ DI , where the image domainDI = (0, 0), (1, 0), . . . , (nx − 1, 0), (0, 1), (1, 1), . . . , (nx −1, 1), (0, ny − 1), (1, ny − 1), . . . , (nx − 1, ny − 1).
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Image domain and feature space
Pixels with similar features should be mapped onto nearbypositions in the feature space <n.
The feature space can be changed by image filtering.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Adjacency relation
An adjacency relation A ⊆ DI × DI may be defined in the imagedomain and/or feature space as a binary relation.
A1 : (p, q) ∈ DI × DI | ‖q − p‖ ≤ α1,
A2 : (p, q) ∈ DI × DI | ‖I(q)− I(p)‖ ≤ α2,
A3 : (p, q) ∈ DI×DI | ‖q−p‖ ≤ α1 and ‖I(q)−I(p)‖ ≤ α2,α1, α2 ∈ <+, such that A(p) is the set of pixels q adjacent to p.
For the image on the right, whatis the adjacency set of p = (2, 3)for A1,A2, and A3, whenα1 =
√5 and α2 = 0?
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Adjacency relation
An adjacency relation A ⊆ DI × DI may be defined in the imagedomain and/or feature space as a binary relation.
A1 : (p, q) ∈ DI × DI | ‖q − p‖ ≤ α1,
A2 : (p, q) ∈ DI × DI | ‖I(q)− I(p)‖ ≤ α2,
A3 : (p, q) ∈ DI×DI | ‖q−p‖ ≤ α1 and ‖I(q)−I(p)‖ ≤ α2,α1, α2 ∈ <+, such that A(p) is the set of pixels q adjacent to p.
For the image on the right, whatis the adjacency set of p = (2, 3)for A1,A2, and A3, whenα1 =
√5 and α2 = 0?
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Adjacency relation
We will consider only translation-invariant adjacency relations.A : (p, qk) ∈ DI × DI | (xqk , yqk )− (xp, yp) = (dxk , dyk), k =1, 2, . . . ,K, where (dxk , dyk) is a set of K displacements.
One can store the displacements and generate the setA(p) = qk, qk = (xqk , yqk ) = (xp + dxk , yp + dyk),k = 1, 2, . . . ,K , for any p ∈ DI .
For fixed distancements (−2,−1), (0, 2), examples of setsA(p) = q1, q2 are
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Adjacency relation
We will consider only translation-invariant adjacency relations.A : (p, qk) ∈ DI × DI | (xqk , yqk )− (xp, yp) = (dxk , dyk), k =1, 2, . . . ,K, where (dxk , dyk) is a set of K displacements.
One can store the displacements and generate the setA(p) = qk, qk = (xqk , yqk ) = (xp + dxk , yp + dyk),k = 1, 2, . . . ,K , for any p ∈ DI .
For fixed distancements (−2,−1), (0, 2), examples of setsA(p) = q1, q2 are
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Adjacency relation
We will consider only translation-invariant adjacency relations.A : (p, qk) ∈ DI × DI | (xqk , yqk )− (xp, yp) = (dxk , dyk), k =1, 2, . . . ,K, where (dxk , dyk) is a set of K displacements.
One can store the displacements and generate the setA(p) = qk, qk = (xqk , yqk ) = (xp + dxk , yp + dyk),k = 1, 2, . . . ,K , for any p ∈ DI .
For fixed distancements (−2,−1), (0, 2), examples of setsA(p) = q1, q2 are
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Kernel
A kernel is a pair (A,W ) in which A is translation invariantand W (qk − p) = wk , k = 1, 2, . . . ,K .
One can then store the set dxk , dyk ,wk, k = 1, 2, . . . ,K ,and interpret a kernel as a moving image whose domain A(p)changes with p ∈ DI for a fixed scalar function W .
In our example, for w1 = 2 and w2 = −3
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Kernel
A kernel is a pair (A,W ) in which A is translation invariantand W (qk − p) = wk , k = 1, 2, . . . ,K .
One can then store the set dxk , dyk ,wk, k = 1, 2, . . . ,K ,and interpret a kernel as a moving image whose domain A(p)changes with p ∈ DI for a fixed scalar function W .
In our example, for w1 = 2 and w2 = −3
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Kernel
A kernel is a pair (A,W ) in which A is translation invariantand W (qk − p) = wk , k = 1, 2, . . . ,K .
One can then store the set dxk , dyk ,wk, k = 1, 2, . . . ,K ,and interpret a kernel as a moving image whose domain A(p)changes with p ∈ DI for a fixed scalar function W .
In our example, for w1 = 2 and w2 = −3
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Multiband kernel
A multiband kernel is a pair (A,W) in which A is translationinvariant and W(qk − p) = wk = (wk1,wk2, . . . ,wkn),k = 1, 2, . . . ,K .
One can then store the set dxk , dyk ,wk, k = 1, 2, . . . ,K , asa multiband kernel (moving multiband image).
As we will see next, the dimension n of each kernel coefficientmust be the number of image bands when computing theconvolution between a multiband image with a multibandkernel.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Multiband kernel
A multiband kernel is a pair (A,W) in which A is translationinvariant and W(qk − p) = wk = (wk1,wk2, . . . ,wkn),k = 1, 2, . . . ,K .
One can then store the set dxk , dyk ,wk, k = 1, 2, . . . ,K , asa multiband kernel (moving multiband image).
As we will see next, the dimension n of each kernel coefficientmust be the number of image bands when computing theconvolution between a multiband image with a multibandkernel.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Multiband kernel
A multiband kernel is a pair (A,W) in which A is translationinvariant and W(qk − p) = wk = (wk1,wk2, . . . ,wkn),k = 1, 2, . . . ,K .
One can then store the set dxk , dyk ,wk, k = 1, 2, . . . ,K , asa multiband kernel (moving multiband image).
As we will see next, the dimension n of each kernel coefficientmust be the number of image bands when computing theconvolution between a multiband image with a multibandkernel.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Convolution
The convolution (indeed, correlation) between a multiband imageI = (DI , I) and a multiband kernel (A,W) can be defined as agrayscale image J = (DJ , J), where
J(p) =K∑
k=1
I(qk) ·wk ,
I(qk) ·wk =n∑
j=1
Ij(qk)wkj ,
for qk ∈ A(p) and p ∈ DJ ⊇ DI . We usually force DJ = DI .
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Convolution
The moving image translates from p = (xp, yp) = (−∞,−∞) top = (xp, yp) = (+∞,+∞), but J(p) is computed only forp ∈ DJ = DI .
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
The convolution algorithm
Input: I = (DI , I) and dxk , dyk ,wk, k = 1, 2, . . . ,K .
Output: J = (DJ , J).
1. For each p = (xp, yp) ∈ DJ , do
2. J(p)← 0.
3. For k ← 1, 2, . . . ,K , do
4. q = (xq, yq)← (xp + dxk , yp + dyk)
5. If q = (xq, yq) ∈ DI , then
6. J(p)← J(p) + I(q) ·wk .
By choice of the kernel coefficients, different local features of I areextracted in the filtered image J.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
The convolution algorithm
Input: I = (DI , I) and dxk , dyk ,wk, k = 1, 2, . . . ,K .
Output: J = (DJ , J).
1. For each p = (xp, yp) ∈ DJ , do
2. J(p)← 0.
3. For k ← 1, 2, . . . ,K , do
4. q = (xq, yq)← (xp + dxk , yp + dyk)
5. If q = (xq, yq) ∈ DI , then
6. J(p)← J(p) + I(q) ·wk .
By choice of the kernel coefficients, different local features of I areextracted in the filtered image J.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Image filtering for feature extraction
The Sobel filters, for example, can enhance vertical andhorizontal edges of I . The corresponding moving images, inwhich the origin p is the central pixel, are
One can also compute the convolution between I and a kernelbank (A,W1,W2, . . . ,Wm) such that the result will be amultiband image J = (DJ , J), whereJ(p) = (J1(p), J2(p), . . . , Jm(p)).
The kernel coefficients and the adjacent pixel values of eachpixel can be organized into matrices and the convolution canbe efficiently computed by matrix multiplication.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Image filtering for feature extraction
The Sobel filters, for example, can enhance vertical andhorizontal edges of I . The corresponding moving images, inwhich the origin p is the central pixel, are
One can also compute the convolution between I and a kernelbank (A,W1,W2, . . . ,Wm) such that the result will be amultiband image J = (DJ , J), whereJ(p) = (J1(p), J2(p), . . . , Jm(p)).
The kernel coefficients and the adjacent pixel values of eachpixel can be organized into matrices and the convolution canbe efficiently computed by matrix multiplication.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Image filtering for feature extraction
The Sobel filters, for example, can enhance vertical andhorizontal edges of I . The corresponding moving images, inwhich the origin p is the central pixel, are
One can also compute the convolution between I and a kernelbank (A,W1,W2, . . . ,Wm) such that the result will be amultiband image J = (DJ , J), whereJ(p) = (J1(p), J2(p), . . . , Jm(p)).
The kernel coefficients and the adjacent pixel values of eachpixel can be organized into matrices and the convolution canbe efficiently computed by matrix multiplication.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Image filtering for feature extraction
Let (A,Wi ), where Wi(qk − p) = wki ∈ <n, i = 1, 2, . . . ,m,k = 1, 2, . . . ,K , be the i-th kernel of the bank, such that eachkernel is organized as a column of a matrix K.
K =
w11 w12 . . . w1m
w21 w22 . . . w2m...
......
...wK1 wK2 . . . wKm
nK×m
where each vector wki ∈ <n is a column matrix.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Image filtering for feature extraction
For an image (DI , I) and adjacency relation A, we can organize thefeature vectors I(qjk) ∈ <n, of the adjacent pixels qjk ∈ A(pj),k = 1, 2, . . . ,K , of each pixel pj ∈ DI , j = 1, 2, . . . , |DI |, along therows of a matrix XI .
XI =
I(q11) I(q12) . . . I(q1K )I(q21) I(q22) . . . I(q2K )
......
......
I(q|DI |1) I(q|DI |2) . . . I(q|DI |K )
|DI |×nK
where each vector I(qjk) ∈ <n is a row matrix.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Image filtering for feature extraction
The multiplication XIK outputs a matrix XJ ,
XJ =
J(p1)J(p2)
...J(p|DI |)
|DI |×m
where each vector J(pj) ∈ <m, j = 1, 2, . . . , |DI |, is a row matrix.
That is, XJ is the matrix organization of the output image (DJ , J)for DJ = DI .
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Image filtering for feature extraction
The Sobel-vertical-edge kernel can enhance the characters of a carplate and, as we will see later, the integral image can be exploitedto assign higher scores to the best candidate locations.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Integral image
The pixel values of the integral image Iint = (DI , Iint) of an imageI = (DI , I ) are defined by
Iint(p) =∑
∀q∈A(p)
I (q)
A(p) : q ∈ DI | (xq ≤ xp) and (yq ≤ yp)
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Integral image
The integral value within any rectangular region R, delimited bypixels p1, p2, p3 and p4, is∑
∀p∈RI (p) = Iint(p4)− Iint(p2)− Iint(p3) + Iint(p1).
This corresponds to the convolution between the image and anunitary kernel with adjacency defined by R with respect to someorigin p.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Image filtering for feature extraction
By defining R around each pixel p ∈ DI , the integral of theedge-enhanced image can be used to define candidates for theplate location by thresholding (i.e., a weak classifier) andconnected component analysis.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Image filtering for feature extraction
One may define kernels of different sizes and configurations basedon integral images (haar-like features)— the weights are w ≥ 1 inthe white region(s) and −w in the black region(s) — or the otherway around.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Image filtering for feature extraction
The convolution between an image and a bank of kernels generatesa multiband image J = (DJ , J) for feature selection andclassification. Viola & Jones introduced a fast scheme based oncascade of weak classifiers for face detection [5].
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Image filtering for feature extraction
The convolution between an image and a bank of kernels generatesa multiband image J = (DJ , J) for feature selection andclassification. Viola & Jones introduced a fast scheme based oncascade of weak classifiers for face detection [5].
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Systems of weak classifiers
By providing a training set with images and the correspondingannotations of the object location in each image,
one can find the threshold that minimizes the classificationerror in the training set for each feature (and even at eachpixel).
Object detection on an unseen image set, named test set, canbe based on the weighted combination of the decision from allclassifiers.
Some post-processing, such as the analysis of the resultingcomponents, is likely to be required.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Descriptor and classifier learning
One may rely on the knowledge about a given application todefine feature extraction.
However, methods based on data processing and analysis,named data-driven approaches, are more popular nowadays.
Data-driven approaches for descriptor and classifier learningmay be divided into
supervised (discriminative, wrapper),unsupervised (generative, filters), orsemi-supervised (transductive).
Supervised deep neural networks, for instance, project thedescriptor and classifier at the same time [1].
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Descriptor and classifier learning
One may rely on the knowledge about a given application todefine feature extraction.
However, methods based on data processing and analysis,named data-driven approaches, are more popular nowadays.
Data-driven approaches for descriptor and classifier learningmay be divided into
supervised (discriminative, wrapper),unsupervised (generative, filters), orsemi-supervised (transductive).
Supervised deep neural networks, for instance, project thedescriptor and classifier at the same time [1].
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Descriptor and classifier learning
One may rely on the knowledge about a given application todefine feature extraction.
However, methods based on data processing and analysis,named data-driven approaches, are more popular nowadays.
Data-driven approaches for descriptor and classifier learningmay be divided into
supervised (discriminative, wrapper),unsupervised (generative, filters), orsemi-supervised (transductive).
Supervised deep neural networks, for instance, project thedescriptor and classifier at the same time [1].
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Descriptor and classifier learning
One may rely on the knowledge about a given application todefine feature extraction.
However, methods based on data processing and analysis,named data-driven approaches, are more popular nowadays.
Data-driven approaches for descriptor and classifier learningmay be divided into
supervised (discriminative, wrapper),unsupervised (generative, filters), orsemi-supervised (transductive).
Supervised deep neural networks, for instance, project thedescriptor and classifier at the same time [1].
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Supervised and unsupervised learning
A sample (it has a different meaning in Statistics) may be animage, pixel, superpixel, object, or subimage around theobject.
In supervised learning, we are interested in the case a classifierassigns samples to one out of c possible categories ωkck=1.
In unsupervised learning, samples are grouped into one out ofg clusters Gkgk=1 (clustering) based on their proximity inthe feature space.
A “good” descriptor should map samples from a samecategory into the same group and separate the groups asmuch as possible in the feature space, despite the absence ofcategory (label) information.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Supervised and unsupervised learning
A sample (it has a different meaning in Statistics) may be animage, pixel, superpixel, object, or subimage around theobject.
In supervised learning, we are interested in the case a classifierassigns samples to one out of c possible categories ωkck=1.
In unsupervised learning, samples are grouped into one out ofg clusters Gkgk=1 (clustering) based on their proximity inthe feature space.
A “good” descriptor should map samples from a samecategory into the same group and separate the groups asmuch as possible in the feature space, despite the absence ofcategory (label) information.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Supervised and unsupervised learning
A sample (it has a different meaning in Statistics) may be animage, pixel, superpixel, object, or subimage around theobject.
In supervised learning, we are interested in the case a classifierassigns samples to one out of c possible categories ωkck=1.
In unsupervised learning, samples are grouped into one out ofg clusters Gkgk=1 (clustering) based on their proximity inthe feature space.
A “good” descriptor should map samples from a samecategory into the same group and separate the groups asmuch as possible in the feature space, despite the absence ofcategory (label) information.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Supervised and unsupervised learning
A sample (it has a different meaning in Statistics) may be animage, pixel, superpixel, object, or subimage around theobject.
In supervised learning, we are interested in the case a classifierassigns samples to one out of c possible categories ωkck=1.
In unsupervised learning, samples are grouped into one out ofg clusters Gkgk=1 (clustering) based on their proximity inthe feature space.
A “good” descriptor should map samples from a samecategory into the same group and separate the groups asmuch as possible in the feature space, despite the absence ofcategory (label) information.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Unsupervised learning
In unsupervised learning, one can
extract features from training samples,
group samples into g clusters, Gkgk=1, and
apply some clustering validity measure to evaluate eachcandidate solution and find the best one among them.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Supervised learning
In supervised learning, one can
take into account the labels of the samples to improve featureextraction,
classify samples into c categories, ωkck=1, and
apply an effectiveness measure to evaluate the candidatesolution, improve the process, and find the best one among allcandidates.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Training samples and feature matrix
For a given training set Ztr = sjmj=1, a descriptor D is a mappingZtr → Xtr that creates a n ×m feature matrix Xtr .
Xtr =
x11 x12 . . . x1mx21 x22 . . . x2m
......
......
xn1 xn2 . . . xnm
where the columns [x1j , x2j , . . . , xnj ]
t are the feature vectorsxj = x(sj) = (x1(sj), x2(sj), . . . , xn(sj)) of the samples sj ∈ Ztr .
Once D is defined, the pair (Ztr ,Xtr ) is called training dataset Dtr .Similarly, one can use D: Zts → Xts to obtain a test datasetDts = (Zts ,Xts) from a test set Zts .
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Training samples and feature matrix
For a given training set Ztr = sjmj=1, a descriptor D is a mappingZtr → Xtr that creates a n ×m feature matrix Xtr .
Xtr =
x11 x12 . . . x1mx21 x22 . . . x2m
......
......
xn1 xn2 . . . xnm
where the columns [x1j , x2j , . . . , xnj ]
t are the feature vectorsxj = x(sj) = (x1(sj), x2(sj), . . . , xn(sj)) of the samples sj ∈ Ztr .
Once D is defined, the pair (Ztr ,Xtr ) is called training dataset Dtr .Similarly, one can use D: Zts → Xts to obtain a test datasetDts = (Zts ,Xts) from a test set Zts .
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Good practice in machine learning
Good practice in machine learning should
evaluate a method multiple times for statistically independenttraining and test sets,
use the same sets for each method, and
compare methods using a statistical test suitable for theexperiment.
The methods may be distinct classifiers based on a samedescriptor, distinct descriptors using a same classificationmodel, image segmentation algorithms, etc.
Unfortunately, the number of labeled samples are not alwaysenough to draw reliable conclusions.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Good practice in machine learning
Good practice in machine learning should
evaluate a method multiple times for statistically independenttraining and test sets,
use the same sets for each method, and
compare methods using a statistical test suitable for theexperiment.
The methods may be distinct classifiers based on a samedescriptor, distinct descriptors using a same classificationmodel, image segmentation algorithms, etc.
Unfortunately, the number of labeled samples are not alwaysenough to draw reliable conclusions.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Good practice in machine learning
Good practice in machine learning should
evaluate a method multiple times for statistically independenttraining and test sets,
use the same sets for each method, and
compare methods using a statistical test suitable for theexperiment.
The methods may be distinct classifiers based on a samedescriptor, distinct descriptors using a same classificationmodel, image segmentation algorithms, etc.
Unfortunately, the number of labeled samples are not alwaysenough to draw reliable conclusions.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Good practice in machine learning
This process must be repeated multiple times for the statisticalanalysis of a method or the statistical comparison among methods.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Sample selection
Samples are often randomly selected from a larger set Z tocompose the training set Ztr and the test set Zts , such thatZtr ∩ Zts = ∅ and Ztr ∪ Zts = Z.
When their true labels in Z are known a priori, the selectionof a same number of samples per category creates balancedsets.
Alternatively, a same percentage of samples per categorycreates imbalanced sets whenever Z is imbalanced.
For descriptor learning, it is usually better to use balancedtraining sets whereas classifier learning must respect thecharacteristics of the problem as represented by Z.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Sample selection
Samples are often randomly selected from a larger set Z tocompose the training set Ztr and the test set Zts , such thatZtr ∩ Zts = ∅ and Ztr ∪ Zts = Z.
When their true labels in Z are known a priori, the selectionof a same number of samples per category creates balancedsets.
Alternatively, a same percentage of samples per categorycreates imbalanced sets whenever Z is imbalanced.
For descriptor learning, it is usually better to use balancedtraining sets whereas classifier learning must respect thecharacteristics of the problem as represented by Z.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Sample selection
Samples are often randomly selected from a larger set Z tocompose the training set Ztr and the test set Zts , such thatZtr ∩ Zts = ∅ and Ztr ∪ Zts = Z.
When their true labels in Z are known a priori, the selectionof a same number of samples per category creates balancedsets.
Alternatively, a same percentage of samples per categorycreates imbalanced sets whenever Z is imbalanced.
For descriptor learning, it is usually better to use balancedtraining sets whereas classifier learning must respect thecharacteristics of the problem as represented by Z.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Sample selection
Samples are often randomly selected from a larger set Z tocompose the training set Ztr and the test set Zts , such thatZtr ∩ Zts = ∅ and Ztr ∪ Zts = Z.
When their true labels in Z are known a priori, the selectionof a same number of samples per category creates balancedsets.
Alternatively, a same percentage of samples per categorycreates imbalanced sets whenever Z is imbalanced.
For descriptor learning, it is usually better to use balancedtraining sets whereas classifier learning must respect thecharacteristics of the problem as represented by Z.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Sample selection: random variable and field
Given that x(s) = (x1(s), x2(s), . . . , xn(s)) ∈ <n changes withthe random choice of s ∈ Z, then x is said a random fieldwith probability density ρ(x) : x→ [0, 1].
Likewise, each feature xi (s) ∈ <, i ∈ [1, n], changes with therandom choice of s ∈ Z, then xi is said a random variable.
Therefore, the random choices of sj ∈ Z generate differentsequences of observations xj = x(sj), j = 1, 2, . . . , |Ztr | = m,for training and likewise for testing.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Sample selection: random variable and field
Given that x(s) = (x1(s), x2(s), . . . , xn(s)) ∈ <n changes withthe random choice of s ∈ Z, then x is said a random fieldwith probability density ρ(x) : x→ [0, 1].
Likewise, each feature xi (s) ∈ <, i ∈ [1, n], changes with therandom choice of s ∈ Z, then xi is said a random variable.
Therefore, the random choices of sj ∈ Z generate differentsequences of observations xj = x(sj), j = 1, 2, . . . , |Ztr | = m,for training and likewise for testing.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Sample selection: random variable and field
Given that x(s) = (x1(s), x2(s), . . . , xn(s)) ∈ <n changes withthe random choice of s ∈ Z, then x is said a random fieldwith probability density ρ(x) : x→ [0, 1].
Likewise, each feature xi (s) ∈ <, i ∈ [1, n], changes with therandom choice of s ∈ Z, then xi is said a random variable.
Therefore, the random choices of sj ∈ Z generate differentsequences of observations xj = x(sj), j = 1, 2, . . . , |Ztr | = m,for training and likewise for testing.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Sample selection
Whenever the descriptor (or the classifier) has parameters tobe optimized, the use of a third statistically independent setZev , named evaluation set, for optimization may reduce thechances of overfitting.
For a given descriptor, an apprentice classifier can improveperformance along learning iterations as it selects trainingsamples for user supervision. This is called active learning.
Sample selection is never perfect, but cross-validationmethods are the most preferable ones.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Sample selection
Whenever the descriptor (or the classifier) has parameters tobe optimized, the use of a third statistically independent setZev , named evaluation set, for optimization may reduce thechances of overfitting.
For a given descriptor, an apprentice classifier can improveperformance along learning iterations as it selects trainingsamples for user supervision. This is called active learning.
Sample selection is never perfect, but cross-validationmethods are the most preferable ones.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Sample selection
Whenever the descriptor (or the classifier) has parameters tobe optimized, the use of a third statistically independent setZev , named evaluation set, for optimization may reduce thechances of overfitting.
For a given descriptor, an apprentice classifier can improveperformance along learning iterations as it selects trainingsamples for user supervision. This is called active learning.
Sample selection is never perfect, but cross-validationmethods are the most preferable ones.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Sample selection
Cross validation may be called K -hold-out, K -fold, orN × K -fold [2].
K -hold-out: Z is split K times into P% of samples for Ztr
and (100− P)% for Zts , 0 < P < 100, to obtain the statisticsof the effectiveness measure in the test sets. The instances ofZtr and Zts are not statistically independent.
K -fold: Z is split into K parts of approximately equal sizes,using each of the parts for testing and the rest for training Ktimes. The instances of Zts are statistically independent, butnot the instances of Ztr .
N × K -fold: K -fold is repeated N times, usually with K = 2.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Sample selection
Cross validation may be called K -hold-out, K -fold, orN × K -fold [2].
K -hold-out: Z is split K times into P% of samples for Ztr
and (100− P)% for Zts , 0 < P < 100, to obtain the statisticsof the effectiveness measure in the test sets. The instances ofZtr and Zts are not statistically independent.
K -fold: Z is split into K parts of approximately equal sizes,using each of the parts for testing and the rest for training Ktimes. The instances of Zts are statistically independent, butnot the instances of Ztr .
N × K -fold: K -fold is repeated N times, usually with K = 2.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Sample selection
Cross validation may be called K -hold-out, K -fold, orN × K -fold [2].
K -hold-out: Z is split K times into P% of samples for Ztr
and (100− P)% for Zts , 0 < P < 100, to obtain the statisticsof the effectiveness measure in the test sets. The instances ofZtr and Zts are not statistically independent.
K -fold: Z is split into K parts of approximately equal sizes,using each of the parts for testing and the rest for training Ktimes. The instances of Zts are statistically independent, butnot the instances of Ztr .
N × K -fold: K -fold is repeated N times, usually with K = 2.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Sample selection
Cross validation may be called K -hold-out, K -fold, orN × K -fold [2].
K -hold-out: Z is split K times into P% of samples for Ztr
and (100− P)% for Zts , 0 < P < 100, to obtain the statisticsof the effectiveness measure in the test sets. The instances ofZtr and Zts are not statistically independent.
K -fold: Z is split into K parts of approximately equal sizes,using each of the parts for testing and the rest for training Ktimes. The instances of Zts are statistically independent, butnot the instances of Ztr .
N × K -fold: K -fold is repeated N times, usually with K = 2.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Effectiveness and confusion matrix
Let nij be the number of times test samples from category ωi havebeen classified into category ωj for i , j ∈ [1, c] and mts samples. Aconfusion matrix is defined as
ω1 ω2 . . . ωc
ω1 n11 n12 . . . n1cω2 n21 n22 . . . n2c...
......
......
ωc nc1 nc2 . . . ncc
The total of correct classifications is∑c
i=1 nii , beingmts −
∑ci=1 nii the total of misclassifications.
Several effectiveness measures can be obtained from theconfusion matrix (sensitivity, accuracy, specificity, precision,etc). A “good” one is the Cohen’s kappa, which is robust toimbalanced categories.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Effectiveness and confusion matrix
Let nij be the number of times test samples from category ωi havebeen classified into category ωj for i , j ∈ [1, c] and mts samples. Aconfusion matrix is defined as
ω1 ω2 . . . ωc
ω1 n11 n12 . . . n1cω2 n21 n22 . . . n2c...
......
......
ωc nc1 nc2 . . . ncc
The total of correct classifications is
∑ci=1 nii , being
mts −∑c
i=1 nii the total of misclassifications.
Several effectiveness measures can be obtained from theconfusion matrix (sensitivity, accuracy, specificity, precision,etc). A “good” one is the Cohen’s kappa, which is robust toimbalanced categories.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Effectiveness and confusion matrix
Let nij be the number of times test samples from category ωi havebeen classified into category ωj for i , j ∈ [1, c] and mts samples. Aconfusion matrix is defined as
ω1 ω2 . . . ωc
ω1 n11 n12 . . . n1cω2 n21 n22 . . . n2c...
......
......
ωc nc1 nc2 . . . ncc
The total of correct classifications is
∑ci=1 nii , being
mts −∑c
i=1 nii the total of misclassifications.
Several effectiveness measures can be obtained from theconfusion matrix (sensitivity, accuracy, specificity, precision,etc). A “good” one is the Cohen’s kappa, which is robust toimbalanced categories.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Cohen’s kappa
Cohen’s kappa κ measures the agreement between two raters, Aand B, such that nij indicates the number of samples rater A saysthat they are from category ωi , while rater B says that they arefrom category ωj .
κ =Po − Pe
1− Pe,
Po =1
mts
c∑i=1
nii ,
Pe =1
m2ts
c∑i=1
NA(i)NB(i),
where NA(i) =∑c
j=1 nij is the total of samples rater A says thatthey are from category ωi and NB(i) =
∑cj=1 nji is the total of
samples that rater B says they are from category ωi .
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Statistical tests
Statistical tests provide a formal way to decide if the results ofan experiment are significant or accidental.
For example, one can measure the Cohen’s kappa κi (t) ofeach execution t = 1, 2, . . . ,T of each classifier Ci , i ∈ [1, n],on T statistically independent test sets during crossvalidation.
A statistical test starts from a null hypothesis, such as allclassifiers are equivalent, and verify if it can be rejected atsome significance level p (e.g., p = 0.05).
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Statistical tests
Statistical tests provide a formal way to decide if the results ofan experiment are significant or accidental.
For example, one can measure the Cohen’s kappa κi (t) ofeach execution t = 1, 2, . . . ,T of each classifier Ci , i ∈ [1, n],on T statistically independent test sets during crossvalidation.
A statistical test starts from a null hypothesis, such as allclassifiers are equivalent, and verify if it can be rejected atsome significance level p (e.g., p = 0.05).
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Statistical tests
Statistical tests provide a formal way to decide if the results ofan experiment are significant or accidental.
For example, one can measure the Cohen’s kappa κi (t) ofeach execution t = 1, 2, . . . ,T of each classifier Ci , i ∈ [1, n],on T statistically independent test sets during crossvalidation.
A statistical test starts from a null hypothesis, such as allclassifiers are equivalent, and verify if it can be rejected atsome significance level p (e.g., p = 0.05).
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Statistical tests
First, some measure mo , that indicates differences among theclassifiers, is obtained from the experiment. For example, forn = 2 classifiers and a 5× 2-fold cross validation, one cancompute the variances s2t of the differences κ1(t)− κ2(t) ofthe two folds for t = 1, 2, . . . , 5 and define
mo =κ1(1)− κ2(1)√
15
∑5t=1 s
2t
It is shown that mo (a random variable) satisfies someprobability density ρ(mo) when the null hypothesis is satisfied.For the example, a t-distribution of five degrees of freedom.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Statistical tests
First, some measure mo , that indicates differences among theclassifiers, is obtained from the experiment. For example, forn = 2 classifiers and a 5× 2-fold cross validation, one cancompute the variances s2t of the differences κ1(t)− κ2(t) ofthe two folds for t = 1, 2, . . . , 5 and define
mo =κ1(1)− κ2(1)√
15
∑5t=1 s
2t
It is shown that mo (a random variable) satisfies someprobability density ρ(mo) when the null hypothesis is satisfied.For the example, a t-distribution of five degrees of freedom.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Statistical tests
The areas below the curve ρ(mo) are tabulated for each valueof mo , representing the chances p of the null hypothesis becorrect.
If mo is observed above a critical value such that p < 0.05, forinstance, we reject the null hypothesis with less than 5% ofchance to be wrong.
The most popular tests are student’s t-test, Wilcoxonsigned-rank test, analysis of variance (ANOVA), Tukey’s rangetest, Nemenyi test, and Friedman test.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Statistical tests
The areas below the curve ρ(mo) are tabulated for each valueof mo , representing the chances p of the null hypothesis becorrect.
If mo is observed above a critical value such that p < 0.05, forinstance, we reject the null hypothesis with less than 5% ofchance to be wrong.
The most popular tests are student’s t-test, Wilcoxonsigned-rank test, analysis of variance (ANOVA), Tukey’s rangetest, Nemenyi test, and Friedman test.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Statistical tests
The areas below the curve ρ(mo) are tabulated for each valueof mo , representing the chances p of the null hypothesis becorrect.
If mo is observed above a critical value such that p < 0.05, forinstance, we reject the null hypothesis with less than 5% ofchance to be wrong.
The most popular tests are student’s t-test, Wilcoxonsigned-rank test, analysis of variance (ANOVA), Tukey’s rangetest, Nemenyi test, and Friedman test.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Probability density function
A probability density function (pdf) ρ of a random variable x(e.g., a feature) is a mapping ρ : x→ <, such that ρ(x) ≥ 0and P(xo ≤ x ≤ xf ) =
∫ xfxoρ(x)dx ∈ [0, 1] measures the
probability of the value x be in the interval [xo , xf ].
The pdf may also be called probability distribution and, fordiscrete random variables (e.g., pixel intensity), it may becalled probability mass function.
The normalized image histogram, for instance, represents thepdf of the pixel intensity. However, the color x = (x1, x2, x3)of the pixels is a discrete random field. In this case, theρ(x) ≥ 0 defines a manifold in <4.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Probability density function
A probability density function (pdf) ρ of a random variable x(e.g., a feature) is a mapping ρ : x→ <, such that ρ(x) ≥ 0and P(xo ≤ x ≤ xf ) =
∫ xfxoρ(x)dx ∈ [0, 1] measures the
probability of the value x be in the interval [xo , xf ].
The pdf may also be called probability distribution and, fordiscrete random variables (e.g., pixel intensity), it may becalled probability mass function.
The normalized image histogram, for instance, represents thepdf of the pixel intensity. However, the color x = (x1, x2, x3)of the pixels is a discrete random field. In this case, theρ(x) ≥ 0 defines a manifold in <4.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Probability density function
A probability density function (pdf) ρ of a random variable x(e.g., a feature) is a mapping ρ : x→ <, such that ρ(x) ≥ 0and P(xo ≤ x ≤ xf ) =
∫ xfxoρ(x)dx ∈ [0, 1] measures the
probability of the value x be in the interval [xo , xf ].
The pdf may also be called probability distribution and, fordiscrete random variables (e.g., pixel intensity), it may becalled probability mass function.
The normalized image histogram, for instance, represents thepdf of the pixel intensity. However, the color x = (x1, x2, x3)of the pixels is a discrete random field. In this case, theρ(x) ≥ 0 defines a manifold in <4.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Probability density function
In the general case, x = (x1, x2, . . . , xn) defines a manifold in<n+1.
The simplest approach to estimate the pdf starts by countingthe number Ω(x(s)) of samples t ∈ Ztr , whose point x(t) fallswithin a hypercube of volume hn (Parzen window) aroundx(s) ∈ <n.
Let A be an adjacency relation defined by
A(s) : t ∈ Ztr | |xi (t)− xi (s)| ≤ h
2, i = 1, 2, . . . , n,
and w(t) be a kernel weight defined by
w(t) =
1 if t ∈ A(s),0 otherwise.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Probability density function
In the general case, x = (x1, x2, . . . , xn) defines a manifold in<n+1.
The simplest approach to estimate the pdf starts by countingthe number Ω(x(s)) of samples t ∈ Ztr , whose point x(t) fallswithin a hypercube of volume hn (Parzen window) aroundx(s) ∈ <n.
Let A be an adjacency relation defined by
A(s) : t ∈ Ztr | |xi (t)− xi (s)| ≤ h
2, i = 1, 2, . . . , n,
and w(t) be a kernel weight defined by
w(t) =
1 if t ∈ A(s),0 otherwise.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Probability density function
In the general case, x = (x1, x2, . . . , xn) defines a manifold in<n+1.
The simplest approach to estimate the pdf starts by countingthe number Ω(x(s)) of samples t ∈ Ztr , whose point x(t) fallswithin a hypercube of volume hn (Parzen window) aroundx(s) ∈ <n.
Let A be an adjacency relation defined by
A(s) : t ∈ Ztr | |xi (t)− xi (s)| ≤ h
2, i = 1, 2, . . . , n,
and w(t) be a kernel weight defined by
w(t) =
1 if t ∈ A(s),0 otherwise.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Probability density function
The counting Ω(x(s)) is defined by
Ω(x(s)) =∑∀t∈A(s)
w(t).
Clearly, the choice of h is important and a fixed scale k ≥ 1 canmake it adaptive:
Ak(s) : t ∈ Ztr | x(t) is a k-closest observation of x(s),
w(t) =
exp
[−‖x(t)−x(s)‖2
2σ2
]if t ∈ Ak(s),
0 otherwise,
for σ = 13 max∀(s,t)∈Ak
‖x(t)− x(s)‖ [4].
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Probability density function
Let u(1),u(2), . . . ,u(L) be the set of distinct observations x(s),∀s ∈ Ztr . Then, the probability density function ρ can beestimated at any point x(s) = u(j) ∈ <n, 1 ≤ j ≤ L, as
Ω(u(j)) ← Ω(x(s))
ρ(u(j)) =Ω(u(j))∑Li=1 Ω(u(i))
ρ(s) ← ρ(x(s)) = ρ(u(j))
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Probability density function
For an image I = (DI , I), for instance, one can create a pdf imageby assigning to each pixel p ∈ DI a pdf value ρ(p) as estimated inthe feature space defined by I.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Probability density function
For an image I = (DI , I), for instance, one can create a pdf imageby assigning to each pixel p ∈ DI a pdf value ρ(p) as estimated inthe feature space defined by I.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
More basic concepts in Statistics
ρ(x) ≥ 0, x = (x1, x2, . . . , xn), is said joint pdf.
When ρ(x) = ρ(x1)ρ(x2) . . . ρ(xn), the variables are saidstatistically independent.
Let U = u1, u2, . . . , uLi and V = v1, v2, . . . , vLj be therespective sets of distinct observations xi (s) and xj(s),1 ≤ i 6= j ≤ n, ∀s ∈ Ztr (i.e., observations of two randomvariables of x).
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
More basic concepts in Statistics
ρ(x) ≥ 0, x = (x1, x2, . . . , xn), is said joint pdf.
When ρ(x) = ρ(x1)ρ(x2) . . . ρ(xn), the variables are saidstatistically independent.
Let U = u1, u2, . . . , uLi and V = v1, v2, . . . , vLj be therespective sets of distinct observations xi (s) and xj(s),1 ≤ i 6= j ≤ n, ∀s ∈ Ztr (i.e., observations of two randomvariables of x).
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
More basic concepts in Statistics
The Entropy H of a pdf ρ(xi ) measures the unpredictability ofxi — i.e., less uniform is ρ(xi ), lower is H, higher is thepredictability.
H(ρ) = −Li∑
k=1
ρ(uk) log2 ρ(uk).
Given ρ1(xi ) and ρ2(xi ), obtained from two training sets (orimages), the relative entropy, or Kullback-Leibler distanceD(ρ1, ρ2), measures their cross entropy.
D(ρ1, ρ2) =
Li∑k=1
ρ2(uk) lnρ2(uk)
ρ1(uk).
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
More basic concepts in Statistics
The Entropy H of a pdf ρ(xi ) measures the unpredictability ofxi — i.e., less uniform is ρ(xi ), lower is H, higher is thepredictability.
H(ρ) = −Li∑
k=1
ρ(uk) log2 ρ(uk).
Given ρ1(xi ) and ρ2(xi ), obtained from two training sets (orimages), the relative entropy, or Kullback-Leibler distanceD(ρ1, ρ2), measures their cross entropy.
D(ρ1, ρ2) =
Li∑k=1
ρ2(uk) lnρ2(uk)
ρ1(uk).
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
More basic concepts in Statistics
Given ρ1(xi ) and ρ2(xj), from possibly distinct training sets, themutual information I (ρ1, ρ2) is the reduction in uncertainty aboutone variable due to the knowledge of the other.
I (ρ1, ρ2) = H(ρ1)− H(ρ1\ρ2),
I (ρ1, ρ2) =
Li∑k=1
Lj∑l=1
ρ(uk , vl) log2ρ(uk , vl)
ρ1(uk)ρ2(vl).
Mutual information is well used when aligning two images in thesame domain (image registration). The alignment aims atmaximizing the mutual information.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
More basic concepts in Statistics
the mean µi = E [xi ] of a random variable xi can be estimatedby the first moment
µi =
Li∑k=1
ukρ(uk)
the variance σi = E [(xi − µi )2] can be estimated by thesecond moment of (xi − µi )
σ2i =
Li∑k=1
[uk − µi ]2 ρ(uk)
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
More basic concepts in Statistics
The cross-moment σij = σji = E [(xi − µi )(xj − µj)](covariance) of xi and xj can be estimated as
σij =
Li∑k=1
Lj∑l=1
[(uk − µi )(vl − µj)] ρ(uk , vl)
The mean vector µ = E [x] ∈ <n and the covariance matrixΣ = E [(x(s)− µ)(x(s)− µ)t ] is
Σ =
σ21 σ12 . . . σ1nσ21 σ22 . . . σ2n
......
......
σn1 σn2 . . . σ2n
.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
More basic concepts in Statistics
The cross-moment σij = σji = E [(xi − µi )(xj − µj)](covariance) of xi and xj can be estimated as
σij =
Li∑k=1
Lj∑l=1
[(uk − µi )(vl − µj)] ρ(uk , vl)
The mean vector µ = E [x] ∈ <n and the covariance matrixΣ = E [(x(s)− µ)(x(s)− µ)t ] is
Σ =
σ21 σ12 . . . σ1nσ21 σ22 . . . σ2n
......
......
σn1 σn2 . . . σ2n
.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
More basic concepts in Statistics
The joint distribution ρ(uk , vl) = ρ(uk)ρ(vl\uk) =ρ(vl)ρ(uk\vl).
The Cauchy-Schwarz inequality says that σij ≤ σ2i σ2j .
The Pearson correlation coefficient is defined asσijσiσj
.
The variables xi and xj are said uncorrelated when σij = 0.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
More basic concepts in Statistics
For c categories, wkck=1, the Bayes rule says that theoccurrence probability of a category wk given an observation xis
P(wk\x) =P(wk)ρ(x\wk)
ρ(x),
where P(wk\x) is named the posterior probability, P(wk) isthe prior probability, the conditional density function ρ(x\wk)is the likelihood, and ρ(x) is the evidence.
The evidence ρ(x) =∑c
i=1 P(wi )ρ(x\wi ).
The estimation of ρ(x\wk) can be similar to the one of ρ(x),but using only adjacent samples t ∈ Ztr from category wk .
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
More basic concepts in Statistics
For c categories, wkck=1, the Bayes rule says that theoccurrence probability of a category wk given an observation xis
P(wk\x) =P(wk)ρ(x\wk)
ρ(x),
where P(wk\x) is named the posterior probability, P(wk) isthe prior probability, the conditional density function ρ(x\wk)is the likelihood, and ρ(x) is the evidence.
The evidence ρ(x) =∑c
i=1 P(wi )ρ(x\wi ).
The estimation of ρ(x\wk) can be similar to the one of ρ(x),but using only adjacent samples t ∈ Ztr from category wk .
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
More basic concepts in Statistics
For c categories, wkck=1, the Bayes rule says that theoccurrence probability of a category wk given an observation xis
P(wk\x) =P(wk)ρ(x\wk)
ρ(x),
where P(wk\x) is named the posterior probability, P(wk) isthe prior probability, the conditional density function ρ(x\wk)is the likelihood, and ρ(x) is the evidence.
The evidence ρ(x) =∑c
i=1 P(wi )ρ(x\wi ).
The estimation of ρ(x\wk) can be similar to the one of ρ(x),but using only adjacent samples t ∈ Ztr from category wk .
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Clustering methods
As we will see in the course, it is possible to separate the g domesof the pdf manifold ρ(x) into g clusters.
heat map of the pdf 4 groups 2 categories
This is the result of the method in [4].
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Clustering methods
For images, using the Lab color space, it can obtain the followingresults.
image pdf groups
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Clustering methods
For images, using the Lab color space, it can obtain the followingresults.
image pdf groups
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Clustering methods
The most used method, however, assumes that the clusters arehyper-spheres — the g -means clustering.
g = 2 groups g = 4 groups 2 categories
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Clustering methods
The g -means clustering algorithm finds g groups Gkgk=1(clusters) by assigning each sample s ∈ Ztr = sjmj=1, m g , toone group, such that
g∑k=1
∑∀s∈Gk
‖x(s)− µk‖2
is minimized and
µk =1
|Gk |∑∀s∈Gk
x(s)
is the centroid of group Gk . The algorithm works as follows.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Clustering methods
Input : A training dataset (Ztr ,Xtr ).Output: A label map L : Ztr → kgk=1 (i.e., L(s) = k ⇒ s ∈ Gk).
1. Select g random centroids µkgk=1 from x(sj)mj=1.
2. For each iteration t = 1, 2, . . . ,T do.
3. For each sample s ∈ sjmj=1 do.
4. Set L(s)← arg mink=1,2,...,g‖x(s)− µk‖2.
5. For each group Gk , k = 1, 2, . . . , g , do.
6. Update µk ← 1|Gk |
∑∀s∈Ztr |L(s)=k x(s).
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Clustering methods
The algorithm may be interrupted, when the differencesbetween previous and current centroids are negligible, and atest sample s ∈ Zts is assigned to the group of its closestcentroid in <n.
The representative x(s) of group Gk can also be selected asthe observation closest to the others in Gk .
x(s) ← arg minx(s′),∀s′,t∈Ztr |L(t)=L(s′)=k
‖x(t)− x(s ′)‖2.
The observation x(s) is called medoid and the methodbecomes g -medoids.
Other popular clustering approaches are mean-shift,normalized cut, Gaussian mixture models, and single-linkage.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Classification methods
Let Li (x) be a discriminant function that assigns a samples ∈ Ztr with observation x(s) to category ωi , by setting labelL(s)← i , 1 ≤ i ≤ c , when
ωi = arg maxj=1,2,...,c
Lj(x(s)).
A Bayesian classifier adopts Li (x) = P(ωi\x).
Indeed, one can use any other equivalent function, such asLi (x) = log [P(ωi )ρ(x\ωi )].
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Classification methods
Let Li (x) be a discriminant function that assigns a samples ∈ Ztr with observation x(s) to category ωi , by setting labelL(s)← i , 1 ≤ i ≤ c , when
ωi = arg maxj=1,2,...,c
Lj(x(s)).
A Bayesian classifier adopts Li (x) = P(ωi\x).
Indeed, one can use any other equivalent function, such asLi (x) = log [P(ωi )ρ(x\ωi )].
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Bayesian classifier
Let ρ(x\ωi ) be a Normal distribution N(µi ,Σi ) . Then,
Li (x) = log [P(ωi )] +
log
1
(2π)n2
√|Σi |
exp
[−1
2(x− µi )tΣ−1i (x− µi )
],
where µi and Σi can be estimated as the mean vector andcovariance matrix of the observations x(s), s ∈ Ztr , whose truelabel λ(s) = i (i.e., s ∈ ωi ). The argument (x− µi )tΣ−1i (x− µi ) isthe squared of the Mahalanobis distance between x(s) and µi .
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Quadratic discriminant classifier
A quadratic discriminant classifier (QDC) adopts
Li (x) = wi0 + wti x + xtWix,
where x,wi ∈ <n, Wi is a n × n matrix, and wi0 ∈ <.
By adopting ρ(x\ωi ) = N(µi ,Σi ), the Bayesian classifier becomesa QDC, where wi0 = log [P(ωi )]− 1
2µti Σ−1i µi − 1
2 log (|Σi |),
wi = µti Σ−1i , and Wi = −1
2Σ−1i .
Obs: Category-independent terms are eliminated.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Linear discriminant classifier
A linear discriminant classifier (LDC) adopts
Li (x) = wi0 + wti x.
By adopting ρ(x\ωi ) = N(µi ,ST ), ST = 1m
∑ci=1miΣi , where mi
is the number of training samples from category ωi , the Bayesianclassifier becomes a LDC, where wi0 = log [P(ωi )]− 1
2µti S−1T µi and
wti = µti S
−1T .
Obs: Category-independent terms are eliminated.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
The k-nearest neighbor classifier
The k-nearest neighbor classifier, k ≥ 1, adopts the k-closestadjacency relation
Ak(s) : t ∈ Ztr | x(t) is a k-closest observation of x(s)
and counts the number ki of samples t whose true label λ(t) = ifor each i = 1, 2, . . . , c.
It then approximates Li (x) = P(ωi\x) ≈ kik and assigns to s the
label L(s) ∈ ici=1 (i.e., it classifies s as belonging to ωi ) whenLi (x(s)) = maxj=1,2,...,cLj(x(s)).
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Role of feature space reduction
The reduction of the feature space <n to some dimension1 ≤ k < n is also useful to handle the curse of highdimensionality or to understand the distribution of theobservations x(s), ∀s ∈ Ztr , in <n.
The reduction can use a linear (e.g., PCA, LDA) or anon-linear (e.g., MDS,t-SNE) projection.
A linear projection is Ytr = WXtr , where W is a k × n matrixand Xtr is the n×m feature matrix of the m training samples.
In the reduction by principal component analysis (PCA), therows of W are the corresponding eigenvectors of the k highesteigenvalues of the covariance matrix Σ of the observations.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Role of feature space reduction
The reduction of the feature space <n to some dimension1 ≤ k < n is also useful to handle the curse of highdimensionality or to understand the distribution of theobservations x(s), ∀s ∈ Ztr , in <n.
The reduction can use a linear (e.g., PCA, LDA) or anon-linear (e.g., MDS,t-SNE) projection.
A linear projection is Ytr = WXtr , where W is a k × n matrixand Xtr is the n×m feature matrix of the m training samples.
In the reduction by principal component analysis (PCA), therows of W are the corresponding eigenvectors of the k highesteigenvalues of the covariance matrix Σ of the observations.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Role of feature space reduction
The reduction of the feature space <n to some dimension1 ≤ k < n is also useful to handle the curse of highdimensionality or to understand the distribution of theobservations x(s), ∀s ∈ Ztr , in <n.
The reduction can use a linear (e.g., PCA, LDA) or anon-linear (e.g., MDS,t-SNE) projection.
A linear projection is Ytr = WXtr , where W is a k × n matrixand Xtr is the n×m feature matrix of the m training samples.
In the reduction by principal component analysis (PCA), therows of W are the corresponding eigenvectors of the k highesteigenvalues of the covariance matrix Σ of the observations.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Role of feature space reduction
The reduction of the feature space <n to some dimension1 ≤ k < n is also useful to handle the curse of highdimensionality or to understand the distribution of theobservations x(s), ∀s ∈ Ztr , in <n.
The reduction can use a linear (e.g., PCA, LDA) or anon-linear (e.g., MDS,t-SNE) projection.
A linear projection is Ytr = WXtr , where W is a k × n matrixand Xtr is the n×m feature matrix of the m training samples.
In the reduction by principal component analysis (PCA), therows of W are the corresponding eigenvectors of the k highesteigenvalues of the covariance matrix Σ of the observations.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Role of feature space reduction
The reduction by linear discriminant analysis (LDA) considers thetrue label λ(s) ∈ ici=1 (i.e., s ∈ ωi ) of the training sampless ∈ Ztr and the rows of W are the corresponding eigenvectors ofthe k = c − 1 highest eigenvalues of the matrix SBS
−1W , where
SB =c∑
i=1
mi (µi − µ)(µi − µ)t
SW =c∑
i=1
Si
Si =∑∀s∈Ztr
(x(s)− µi )(x(s)− µi )t ,
µ =1
m
∑∀s∈Ztr
x(s)
µi =1
mi
∑∀s∈Ztr |λ(s)=i
x(s).
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Role of feature space reduction
From Rauber et al. [3].
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
Role of feature space reduction
From Rauber et al. [3].
Alexandre Xavier Falcao MC940/MO445 - Image Analysis
[1] Ian Goodfellow, Yoshua Bengio, and Aaron Courville.
Deep Learning.
MIT Press, 2016.
http://www.deeplearningbook.org.
[2] Ludmila I. Kuncheva.
Combining Pattern Classifiers: Methods and Algorithms.
Wiley-Interscience, 2004.
[3] P.E. Rauber, A.X. Falcao, and A.C. Telea.
Projections as visual aids for classification system design.
Information Visualization, 2017.
[4] L.M. Rocha, F.A.M. Cappabianco, and A.X. Falcao.
Data clustering as an optimum-path forest problem with applicationsin image analysis.
Int. J. Imaging Syst. Technol., 19(2):50–68, June 2009.
[5] P. Viola and M. Jones.
Rapid object detection using a boosted cascade of simple features.Alexandre Xavier Falcao MC940/MO445 - Image Analysis
In Proceedings of the 2001 IEEE Computer Society Conference onComputer Vision and Pattern Recognition. CVPR 2001, volume 1,pages I–511–I–518 vol.1, 2001.
Alexandre Xavier Falcao MC940/MO445 - Image Analysis