Bird Nest Images Classification Based on Canny Edge Detection and Local Binary Pattern Chunbo Song [email protected]University of Delaware, Newark, DE 19711, USA Abstract This paper presents a way to classify between the nest image and the non-nest image. The input image is processed by Canny Edge Detection, and using Local Binary Pattern to extract the local textural features. K-means was applied to clustering LBP feature. And, dividing images into several blocks and finding out the dominant local textural features for describing the corresponding block. The performance of this method is tested by support vector machines, random forest and K-nearest neighbors. 1. Introduction Image classification has become a very popular topic in computer vision research. Contextual image classification, a topic of pattern recognition in computer vision, is an approach of classification based on contextual information in images. "Contextual" means this approach is focusing on the relationship of the nearby pixels, which is also called neighborhood. The goal of this approach is to classify the images by using the contextual information [1, 3]. The key issues in image classification contain image preprocessing, feature extraction and classification methods. Different methods such as Gabor filter, Principal Component Analysis (PCA), Local Binary Pattern, etc., are all used for feature extraction [3]. These methods have widely applied into face recognition and other areas. The main issue still exists. For example, most bird nests are attached to the tree, such that interference from either trees or background (e.g. sky, building wall) will be introduced. Intuitively, the bird nest can easily be observed by human eyes if there is no obstruct. But how to describe the bird nest to machine to make machine can know whether the image is a bird nest image or not has become a very important problem for future classification. Moreover, because of different perspective of bird nests from our image collections and different kinds of bird nests, it is difficult to get a good classification result based on such situations. In this research, two kinds of image feature extraction methods are used for bird nests classification. One is Canny Edge Detection, another is Local binary pattern. For the past almost two decades, many studies about image classification are based on local binary pattern (LBP), which has achieved a great success on face recognition. As an approach of edge detection, Canny Edge detection is a very useful tool find detecting edges. Besides, for generally describing local textural features, a clustering method (K means) was used, then the similar local binary pattern can be grouped as one cluster. In this research, the pipeline of the bird nest image classification is here:
11
Embed
Bird Nest Images Classification Based on Canny Edge Detection and Local Binary Pattern · 2017-11-30 · Bird Nest Images Classification Based on Canny Edge Detection and Local Binary
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Bird Nest Images Classification Based on Canny Edge Detection and Local
Binary Pattern Chunbo Song [email protected] University of Delaware, Newark, DE 19711, USA
Abstract This paper presents a way to classify between the nest image and the non-nest image. The input image is processed
by Canny Edge Detection, and using Local Binary Pattern to extract the local textural features. K-means was applied
to clustering LBP feature. And, dividing images into several blocks and finding out the dominant local textural features
for describing the corresponding block. The performance of this method is tested by support vector machines, random
forest and K-nearest neighbors.
1. Introduction Image classification has become a very popular topic in computer vision research. Contextual image
classification, a topic of pattern recognition in computer vision, is an approach of classification based on
contextual information in images. "Contextual" means this approach is focusing on the relationship of the
nearby pixels, which is also called neighborhood. The goal of this approach is to classify the images by
using the contextual information [1, 3]. The key issues in image classification contain image preprocessing,
feature extraction and classification methods. Different methods such as Gabor filter, Principal
Component Analysis (PCA), Local Binary Pattern, etc., are all used for feature extraction [3]. These
methods have widely applied into face recognition and other areas.
The main issue still exists. For example, most bird nests are attached to the tree, such that interference
from either trees or background (e.g. sky, building wall) will be introduced. Intuitively, the bird nest can
easily be observed by human eyes if there is no obstruct. But how to describe the bird nest to machine to
make machine can know whether the image is a bird nest image or not has become a very important
problem for future classification. Moreover, because of different perspective of bird nests from our image
collections and different kinds of bird nests, it is difficult to get a good classification result based on such
situations.
In this research, two kinds of image feature extraction methods are used for bird nests classification. One
is Canny Edge Detection, another is Local binary pattern. For the past almost two decades, many studies
about image classification are based on local binary pattern (LBP), which has achieved a great success on
face recognition. As an approach of edge detection, Canny Edge detection is a very useful tool find
detecting edges. Besides, for generally describing local textural features, a clustering method (K means)
was used, then the similar local binary pattern can be grouped as one cluster.
In this research, the pipeline of the bird nest image classification is here:
Figure 1 Pipeline
After extracting image features, support vector machines, K-nearest neighbors and Random Forest are
used for classification. In the section of classification experiment, the comparison of the results from these
different classifiers is provided. Considering the popularity of deep learning methods used in image
recognition, we also provide a possible way to improve the classification accuracy by convolutional neural
networks.
In the section 2, we mainly discuss the data creation methods, which contains where the raw images come
from, and bounding boxes generating methods. Canny Edge Detection, Local Binary Pattern and image
dividing are discussed in the section 3 for the feature extraction. We used three kinds of machine learning
methods to make classification, these methods are briefly discussed in the section 4. And our
experimental results are shown in section 5. Depending on the results, conclusion and some are discussed,
which is in the section 6.
2. Data Creation
2.1 Image Source All 114 images contained bird nests are found from the web, some of them contain one nest in one image,
others contain more nest, as shown in Figure 2.
Bird Nest
Images
Canny Edge
Detection
Local
Binary
Pattern K Means
Clustering
Dividing
Image into
Blocks
Classification
Generate
positive samples
and negative
samples
Resize
Bounding
Box
Figure 2 All Source Images
Intuitively, we can find that there are three kinds of scales for these images, which are far scale, medium
scale and closed scale. Far scaled images, due to the low resolution in a small region of the image, may
have fewer details. For the closed scale, the bird nest regions can have higher resolution. In Figure 3, (a)
is far scale, (b) is medium scale, and (c) is closed scale.
Figure 3 Far Scale, Medium Scale and Closed Scale Source Image
2.2 Generating Bounding Box Since some source images contain watermark, to avoid from interference of these watermark, the
watermarked image is just cropped. If the watermark is widespread the image, just ignoring the image.
To facilitate the future tasks, both the positive sample and negative sample were generated from images
which contain bird nests. The positive samples are manually picked, such that we can have their
corresponding coordinates. Due to randomly generating negative samples from an image, it is possible
(a)
(b)
(c)
that two bounding boxes (either positive or negative samples) can overlap with each other, which will
heavily affect following works. The straightforward way to address this problem is that setting an
overlapping ratio. The overlapping ratio can be calculated by
𝑜𝑣𝑒𝑟𝑙𝑎𝑝𝑝𝑖𝑛𝑔 𝑟𝑎𝑡𝑖𝑜 = 𝑜𝑣𝑒𝑟𝑙𝑎𝑝𝑝𝑖𝑛𝑔 𝑎𝑟𝑒𝑎
𝑠𝑒𝑙𝑓 𝑎𝑟𝑒𝑎
In my experiment, and the ratio is set to 0.3. Thus, a negative sample image cannot contains an obvious
bird nest. And every two negative samples also cannot overlap with each other more than the overlapping
ratio. When the overlapping ratio is greater than 0.3, we will try for another time generating the negative
samples if our attempts is not greater than the total limited times. As the Figure 4 shown, the blue box
contains a bird nest, and the red box contains the negative sample. And the size of the generated negative
bounding boxes is also defined by the following way.
In one image, there is at least one positive bounding box. So randomly picking one positive bounding box
from the current image, and define the size of the generated bounding boxes as:
Since the positive sample region, for some closed scale images, may occupy most area of the whole image,
such that generating negative samples with similar size may be very hard based on our limitation of
overlapping ratio. In light of this, generating negative samples from this kind images is abandoned. Figure
5 illustrates that different numbers of generated bounding boxes from different kinds of image are
produced. According to this method, 142 positive samples are gotten, and 695 negative samples are
randomly generated.
Figure 5 Generating different number of negative samples
Figure 6 Image Resizing
Because of different size of these bounding boxes, on one hand, it is necessary to resize these images to
a same size, though the shape may be changed. On the other hand, some bounding box is quite small,
such that we cannot obtain enough features or useful information unless making it larger. By cubic-
interpolation method, both positive and negative sample images are resized to the same scale, as Figure
6 shown.
3. Image Processing Methods
3.1 Canny Edge Detection The Canny edge detector is an edge detection operator that uses a multi-stage algorithm to detect a wide
range of edges in images. It was developed by John F. Canny in 1986.
Edge detection, especially step edge detection has been widely applied in various different computer
vision systems, which is an important technique to extract useful structural information from different
vision objects and dramatically reduce the amount of data to be processed. Canny has found that, the
requirements for the application of edge detection on diverse vision systems are relatively the same. Thus,
a development of an edge detection solution to address these requirements can be implemented in a
wide range of situations [6]. The general criteria for edge detection includes
1. Detection of edge with low error rate, which means that the detection should accurately catch as
many edges shown in the image as possible
2. The edge point detected from the operator should accurately localize on the center of the edge.
3. A given edge in the image should only be marked once, and where possible, image noise should
not create false edges.
(a) (b)
Resize by Cubic
Interpolation
To satisfy these requirements Canny used the calculus of variations – a technique which finds the function
which optimizes a given functional. The optimal function in Canny's detector is described by the sum of
four exponential terms, but it can be approximated by the first derivative of a Gaussian [6].
Among the edge detection methods developed so far, canny edge detection algorithm is one of the most
strictly defined methods that provides good and reliable detection. Owing to its optimality to meet with
the three criteria for edge detection and the simplicity of process for implementation, it becomes one of
the most popular algorithms for edge detection [6].
Obviously, Canny Edge Detection is quite helpful for feature extraction. But, since the output image by
Canny Edge Detection is a binary image, in order to keep the intensity variance, we first inverted every bit
of an input image, then using this result to get the Canny Edge image. Due to the properties of Canny Edge
image, the Canny Edge image can be used as a mask, such that, by combining the input image with the
mask, a Canny Edge image with color intensity variance can be obtained. Like the Figure 7 shown.
Figure 7 Canny Edge Image Processing
3.2 Local Binary Pattern The LBP feature vector, in its simplest form, is created in the following manner [4]:
1. Divide the examined window into cells (e.g. 16x16 pixels for each cell).
2. For each pixel in a cell, compare the pixel to each of its 8 neighbors (on its left-top, left-middle,
left-bottom, right-top, etc.). Follow the pixels along a circle, i.e. clockwise or counter-clockwise.
3. Where the center pixel's value is greater than the neighbor's value, write "1". Otherwise, write
"0". This gives an 8-digit binary number (which is usually converted to decimal for convenience).
4. Compute the histogram, over the cell, of the frequency of each "number" occurring (i.e., each
combination of which pixels are smaller and which are greater than the center).
5. Optionally normalize the histogram.
6. Concatenate (normalized) histograms of all cells. This gives the feature vector for the window.
The feature vector can now be processed using the Support vector machine or some other machine-
learning algorithm to classify images. Such classifiers can be used for face recognition or texture analysis.
Copy
Source
Canny Edge Detection
Invert
Mask
Figure 8 Three neighborhood examples used to define a texture and calculate a local binary pattern (LBP)
A useful extension to the original operator is the so-called uniform pattern, which can be used to reduce
the length of the feature vector and implement a simple rotation invariant descriptor. This idea is
motivated by the fact that some binary patterns occur more commonly in texture images than others. A
local binary pattern is called uniform if the binary pattern contains at most two 0-1 or 1-0 transitions. For
example, 00010000(2 transitions) is a uniform pattern, 01010100(6 transitions) is not. In the computation
of the LBP histogram, the histogram has a separate bin for every uniform pattern, and all non-uniform
patterns are assigned to a single bin. Using uniform patterns, the length of the feature vector for a 3x3
window reduces from 256 to 59 [4, 5].
3.3 Clustering and Dividing Image into Blocks Considering of getting 255 different gray intensity after using LBP, it is a one dimension feature for locally
describing features. We need to combine some similar features into a cluster. Thus, K-Means clustering
method is used for this work. K-means clustering is a method of vector quantization, originally from signal
processing, that is popular for cluster analysis in data mining. K-means clustering aims to partition n
observations into k clusters in which each observation belongs to the cluster with the nearest mean,
serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.
After clustering, in order to generate available attributes for classification, we divide the post-processed
image into several same sized blocks. Such that, each block contains the information about gray intensity
distributions corresponding to the different block location. Then, according to the K clusters, we can
obtain a dominant local feature for each block. Labelling blocks by these dominant local features. Thus, if
we have n blocks for the image, n attributes can be generated in this way.
Figure 9 Dominant Feature in Block
12 25 25 70
25 153 25 70
12 210 25 25
70 25 70 210
25
25 is the dominant
feature in this block
4. Classification Methods
4.1 Support Vector Machines (SVMs) SVM is a powerful widely used machine learning method for solving supervised classification problems,
because of its generalization ability. Given a set of training pattern examples, for two classes’ problem,
each is marked as belonging to one of two categories. SVM classifiers maximize the margin between the
training patterns and the decision boundary, which can be formulated as a quadratic optimization
problem in the feature space. Those vectors that are closest to the decision boundary are called as support
vectors [11].
For linear kernel SVMs, it will construct a hyperplane 𝑊𝑇𝑥 + 𝑏 = 0 so that the margin between the
hyperplane and the nearest point is maximized can be posed as the following optimization problem
𝑚𝑖𝑛𝑤1
2(𝑊𝑇𝑊) (21)
subject to 𝑦𝑖 ((𝑊𝑇𝑥𝑗) − 𝑏) ≥ 1 − 𝜉𝑖(𝑓𝑜𝑟 𝑎𝑛𝑦 𝑖 = 1, … , 𝑛). By introducing Lagrange multipliers 𝜶, and
using the KKT theorem of optimization theory, for the vector x, the decision function then becomes
𝑓(𝑥) = 𝑠𝑔𝑛(∑ 𝑦𝑖𝛼𝑖⟨𝑥, 𝑥𝑖⟩ + 𝑏𝑁𝑖=1 ) (21)
By using the inner product ⟨𝑥, 𝑥𝑖⟩ = 𝑥𝑇𝑥𝑖 with kernel function 𝐾(𝑥, 𝑥𝑖), the input data are mapped to a
higher dimensional space. Then, in this higher dimensional space, a separating hyperplane is built to
maximize the margin. In the lower dimensional data space, this hyperplane becomes a non-linear
separating function. More popular kernel methods are widely used in SVMs, like polynomial kernel and
RBF kernel [11].
4.2 Random Forests As an ensemble learning method for classification, regression and other tasks, Random Forests operates
by constructing a multitude of decision tress at training time and outputting the class that is the mode of
the classes or mean prediction of the individual trees. Unlike the decision trees’ habit of overfitting to
their training set, Random Forest can correct these decision trees’ results. Since small changes in the
training data induce a high variance in single classification trees and often lead to rather low classification
accuracies [9]. The idea of Random Forests is to grow multiple decision trees on random subsets of the
training data and related variables. For the classification of previously unseen data, Random Forests take
advantage of the high variance among individual trees, letting each tree vote for the class membership,
and assigning the respective class according to the majority of the votes. Such ensembles demonstrate
robust and accurate performance on complex datasets with little need for fine-tuning and in the presence
of many noisy variables [9].
4.3 K-Nearest Neighbors The K-Nearest Neighbors (KNN) algorithm, in pattern recognition, is a non-parametric method used for
classification and regression. In both cases, the input consists of the k closest training examples in the
feature space. And KNN is a type of instance based learning, where the function is only approximated
locally and all computation is deferred until classification [10].
Both for classification and regression, it can be useful to assign weight to the contributions of the
neighbors, so that the nearer neighbors contribute more to the average than the more distant ones. For
example, a common weighting scheme consists in giving each neighbor a weight of 1/d, where d is the
distance to the neighbor [10].
The neighbors are taken from a set of objects for which the class (for k-NN classification) or the object
property value (for k-NN regression) is known. This can be thought of as the training set for the algorithm,
though no explicit training step is required [10].
A shortcoming of the k-NN algorithm is that it is sensitive to the local structure of the data. The algorithm
has nothing to do with and is not to be confused with k-means, another popular machine learning
technique [10].
5. Experiments In the classification experiments, Support Vector Machines, K-Nearest Neighbors and Random Forest are
used for testing the classification accuracy, based on the 142 positive samples and 695 negative samples.
Table 1 shows the result of the classification based on SVMs. Even though the result seems to be
acceptable by RBF kernel, many positive samples are misclassified. For the number of blocks is 20*20,
there are 42 misclassified positive samples. Different gamma values for RBF kernel are tested (gamma is
equal to 0.1, 0.2, 0.25, 0.4, 0.5, 0.75), best result can be obtained when gamma is equal to 0.5.
Table 1 Classification Results by SVMs (RBF kernel’s gamma = 0.5, Canny Edge Detection’s threshold is [10, 19])
Kernel Methods Clusters Blocks Number Training Accuracy