Bird Nest Images Classification Based on Canny Edge Detection and Local Binary Pattern · 2017-11-30 · Bird Nest Images Classification Based on Canny Edge Detection and Local Binary

Bird Nest Images Classification Based on Canny Edge Detection and Local

Binary Pattern Chunbo Song [email protected] University of Delaware, Newark, DE 19711, USA

Abstract This paper presents a way to classify between the nest image and the non-nest image. The input image is processed

by Canny Edge Detection, and using Local Binary Pattern to extract the local textural features. K-means was applied

to clustering LBP feature. And, dividing images into several blocks and finding out the dominant local textural features

for describing the corresponding block. The performance of this method is tested by support vector machines, random

forest and K-nearest neighbors.

1. Introduction Image classification has become a very popular topic in computer vision research. Contextual image

classification, a topic of pattern recognition in computer vision, is an approach of classification based on

contextual information in images. "Contextual" means this approach is focusing on the relationship of the

nearby pixels, which is also called neighborhood. The goal of this approach is to classify the images by

using the contextual information [1, 3]. The key issues in image classification contain image preprocessing,

feature extraction and classification methods. Different methods such as Gabor filter, Principal

Component Analysis (PCA), Local Binary Pattern, etc., are all used for feature extraction [3]. These

methods have widely applied into face recognition and other areas.

The main issue still exists. For example, most bird nests are attached to the tree, such that interference

from either trees or background (e.g. sky, building wall) will be introduced. Intuitively, the bird nest can

easily be observed by human eyes if there is no obstruct. But how to describe the bird nest to machine to

make machine can know whether the image is a bird nest image or not has become a very important

problem for future classification. Moreover, because of different perspective of bird nests from our image

collections and different kinds of bird nests, it is difficult to get a good classification result based on such

situations.

In this research, two kinds of image feature extraction methods are used for bird nests classification. One

is Canny Edge Detection, another is Local binary pattern. For the past almost two decades, many studies

about image classification are based on local binary pattern (LBP), which has achieved a great success on

face recognition. As an approach of edge detection, Canny Edge detection is a very useful tool find

detecting edges. Besides, for generally describing local textural features, a clustering method (K means)

was used, then the similar local binary pattern can be grouped as one cluster.

In this research, the pipeline of the bird nest image classification is here:

Figure 1 Pipeline

After extracting image features, support vector machines, K-nearest neighbors and Random Forest are

used for classification. In the section of classification experiment, the comparison of the results from these

different classifiers is provided. Considering the popularity of deep learning methods used in image

recognition, we also provide a possible way to improve the classification accuracy by convolutional neural

networks.

In the section 2, we mainly discuss the data creation methods, which contains where the raw images come

from, and bounding boxes generating methods. Canny Edge Detection, Local Binary Pattern and image

dividing are discussed in the section 3 for the feature extraction. We used three kinds of machine learning

methods to make classification, these methods are briefly discussed in the section 4. And our

experimental results are shown in section 5. Depending on the results, conclusion and some are discussed,

which is in the section 6.

2. Data Creation

2.1 Image Source All 114 images contained bird nests are found from the web, some of them contain one nest in one image,

others contain more nest, as shown in Figure 2.

Bird Nest

Images

Canny Edge

Detection

Local

Binary

Pattern K Means

Clustering

Dividing

Image into

Blocks

Classification

Generate

positive samples

and negative

samples

Resize

Bounding

Box

Figure 2 All Source Images

Intuitively, we can find that there are three kinds of scales for these images, which are far scale, medium

scale and closed scale. Far scaled images, due to the low resolution in a small region of the image, may

have fewer details. For the closed scale, the bird nest regions can have higher resolution. In Figure 3, (a)

is far scale, (b) is medium scale, and (c) is closed scale.

Figure 3 Far Scale, Medium Scale and Closed Scale Source Image

2.2 Generating Bounding Box Since some source images contain watermark, to avoid from interference of these watermark, the

watermarked image is just cropped. If the watermark is widespread the image, just ignoring the image.

To facilitate the future tasks, both the positive sample and negative sample were generated from images

which contain bird nests. The positive samples are manually picked, such that we can have their

corresponding coordinates. Due to randomly generating negative samples from an image, it is possible

(a)

(b)

(c)

that two bounding boxes (either positive or negative samples) can overlap with each other, which will

heavily affect following works. The straightforward way to address this problem is that setting an

overlapping ratio. The overlapping ratio can be calculated by

𝑜𝑣𝑒𝑟𝑙𝑎𝑝𝑝𝑖𝑛𝑔 𝑟𝑎𝑡𝑖𝑜 = 𝑜𝑣𝑒𝑟𝑙𝑎𝑝𝑝𝑖𝑛𝑔 𝑎𝑟𝑒𝑎

𝑠𝑒𝑙𝑓 𝑎𝑟𝑒𝑎

In my experiment, and the ratio is set to 0.3. Thus, a negative sample image cannot contains an obvious

bird nest. And every two negative samples also cannot overlap with each other more than the overlapping

ratio. When the overlapping ratio is greater than 0.3, we will try for another time generating the negative

samples if our attempts is not greater than the total limited times. As the Figure 4 shown, the blue box

contains a bird nest, and the red box contains the negative sample. And the size of the generated negative

bounding boxes is also defined by the following way.

In one image, there is at least one positive bounding box. So randomly picking one positive bounding box

from the current image, and define the size of the generated bounding boxes as:

𝑙𝑜𝑤𝑒𝑟𝑏𝑜𝑢𝑛𝑑ℎ𝑒𝑖𝑔ℎ𝑡 = 0.9 × ℎ𝑒𝑖𝑔ℎ𝑡 𝑜𝑓 𝑡ℎ𝑒 𝑏𝑜𝑢𝑛𝑑𝑖𝑛𝑔 𝑏𝑜𝑥

ℎ𝑖𝑔ℎ𝑒𝑟𝑏𝑜𝑢𝑛𝑑ℎ𝑒𝑖𝑔ℎ𝑡 = 1.1 × ℎ𝑒𝑖𝑔ℎ𝑡 𝑜𝑓 𝑡ℎ𝑒 𝑏𝑜𝑢𝑛𝑑𝑖𝑛𝑔 𝑏𝑜𝑥

𝑙𝑜𝑤𝑒𝑟𝑏𝑜𝑢𝑛𝑑𝑙𝑒𝑛𝑔𝑡ℎ = 0.9 × 𝑙𝑒𝑛𝑔𝑡ℎ 𝑜𝑓 𝑡ℎ𝑒 𝑏𝑜𝑢𝑛𝑑𝑖𝑛𝑔 𝑏𝑜𝑥

ℎ𝑖𝑔ℎ𝑒𝑟𝑏𝑜𝑢𝑛𝑑𝑙𝑒𝑛𝑔𝑡ℎ = 0.9 × 𝑙𝑒𝑛𝑔𝑡ℎ 𝑜𝑓 𝑡ℎ𝑒 𝑏𝑜𝑢𝑛𝑑𝑖𝑛𝑔 𝑏𝑜𝑥

Figure 4 Generating Bounding Box

Since the positive sample region, for some closed scale images, may occupy most area of the whole image,

such that generating negative samples with similar size may be very hard based on our limitation of

overlapping ratio. In light of this, generating negative samples from this kind images is abandoned. Figure

5 illustrates that different numbers of generated bounding boxes from different kinds of image are

produced. According to this method, 142 positive samples are gotten, and 695 negative samples are

randomly generated.

Figure 5 Generating different number of negative samples

Figure 6 Image Resizing

Because of different size of these bounding boxes, on one hand, it is necessary to resize these images to

a same size, though the shape may be changed. On the other hand, some bounding box is quite small,

such that we cannot obtain enough features or useful information unless making it larger. By cubic-

interpolation method, both positive and negative sample images are resized to the same scale, as Figure

6 shown.

3. Image Processing Methods

3.1 Canny Edge Detection The Canny edge detector is an edge detection operator that uses a multi-stage algorithm to detect a wide

range of edges in images. It was developed by John F. Canny in 1986.

Edge detection, especially step edge detection has been widely applied in various different computer

vision systems, which is an important technique to extract useful structural information from different

vision objects and dramatically reduce the amount of data to be processed. Canny has found that, the

requirements for the application of edge detection on diverse vision systems are relatively the same. Thus,

a development of an edge detection solution to address these requirements can be implemented in a

wide range of situations [6]. The general criteria for edge detection includes

1. Detection of edge with low error rate, which means that the detection should accurately catch as

many edges shown in the image as possible

2. The edge point detected from the operator should accurately localize on the center of the edge.

3. A given edge in the image should only be marked once, and where possible, image noise should

not create false edges.

(a) (b)

Resize by Cubic

Interpolation

To satisfy these requirements Canny used the calculus of variations – a technique which finds the function

which optimizes a given functional. The optimal function in Canny's detector is described by the sum of

four exponential terms, but it can be approximated by the first derivative of a Gaussian [6].

Among the edge detection methods developed so far, canny edge detection algorithm is one of the most

strictly defined methods that provides good and reliable detection. Owing to its optimality to meet with

the three criteria for edge detection and the simplicity of process for implementation, it becomes one of

the most popular algorithms for edge detection [6].

Obviously, Canny Edge Detection is quite helpful for feature extraction. But, since the output image by

Canny Edge Detection is a binary image, in order to keep the intensity variance, we first inverted every bit

of an input image, then using this result to get the Canny Edge image. Due to the properties of Canny Edge

image, the Canny Edge image can be used as a mask, such that, by combining the input image with the

mask, a Canny Edge image with color intensity variance can be obtained. Like the Figure 7 shown.

Figure 7 Canny Edge Image Processing

3.2 Local Binary Pattern The LBP feature vector, in its simplest form, is created in the following manner [4]:

1. Divide the examined window into cells (e.g. 16x16 pixels for each cell).

2. For each pixel in a cell, compare the pixel to each of its 8 neighbors (on its left-top, left-middle,

left-bottom, right-top, etc.). Follow the pixels along a circle, i.e. clockwise or counter-clockwise.

3. Where the center pixel's value is greater than the neighbor's value, write "1". Otherwise, write

"0". This gives an 8-digit binary number (which is usually converted to decimal for convenience).

4. Compute the histogram, over the cell, of the frequency of each "number" occurring (i.e., each

combination of which pixels are smaller and which are greater than the center).

5. Optionally normalize the histogram.

6. Concatenate (normalized) histograms of all cells. This gives the feature vector for the window.

The feature vector can now be processed using the Support vector machine or some other machine-

learning algorithm to classify images. Such classifiers can be used for face recognition or texture analysis.

Copy

Source

Canny Edge Detection

Invert

Mask

Figure 8 Three neighborhood examples used to define a texture and calculate a local binary pattern (LBP)

A useful extension to the original operator is the so-called uniform pattern, which can be used to reduce

the length of the feature vector and implement a simple rotation invariant descriptor. This idea is

motivated by the fact that some binary patterns occur more commonly in texture images than others. A

local binary pattern is called uniform if the binary pattern contains at most two 0-1 or 1-0 transitions. For

example, 00010000(2 transitions) is a uniform pattern, 01010100(6 transitions) is not. In the computation

of the LBP histogram, the histogram has a separate bin for every uniform pattern, and all non-uniform

patterns are assigned to a single bin. Using uniform patterns, the length of the feature vector for a 3x3

window reduces from 256 to 59 [4, 5].

3.3 Clustering and Dividing Image into Blocks Considering of getting 255 different gray intensity after using LBP, it is a one dimension feature for locally

describing features. We need to combine some similar features into a cluster. Thus, K-Means clustering

method is used for this work. K-means clustering is a method of vector quantization, originally from signal

processing, that is popular for cluster analysis in data mining. K-means clustering aims to partition n

observations into k clusters in which each observation belongs to the cluster with the nearest mean,

serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells.

After clustering, in order to generate available attributes for classification, we divide the post-processed

image into several same sized blocks. Such that, each block contains the information about gray intensity

distributions corresponding to the different block location. Then, according to the K clusters, we can

obtain a dominant local feature for each block. Labelling blocks by these dominant local features. Thus, if

we have n blocks for the image, n attributes can be generated in this way.

Figure 9 Dominant Feature in Block

12 25 25 70

25 153 25 70

12 210 25 25

70 25 70 210

25

25 is the dominant

feature in this block

4. Classification Methods

4.1 Support Vector Machines (SVMs) SVM is a powerful widely used machine learning method for solving supervised classification problems,

because of its generalization ability. Given a set of training pattern examples, for two classes’ problem,

each is marked as belonging to one of two categories. SVM classifiers maximize the margin between the

training patterns and the decision boundary, which can be formulated as a quadratic optimization

problem in the feature space. Those vectors that are closest to the decision boundary are called as support

vectors [11].

For linear kernel SVMs, it will construct a hyperplane 𝑊𝑇𝑥 + 𝑏 = 0 so that the margin between the

hyperplane and the nearest point is maximized can be posed as the following optimization problem

𝑚𝑖𝑛𝑤1

2(𝑊𝑇𝑊) (21)

subject to 𝑦𝑖 ((𝑊𝑇𝑥𝑗) − 𝑏) ≥ 1 − 𝜉𝑖(𝑓𝑜𝑟 𝑎𝑛𝑦 𝑖 = 1, … , 𝑛). By introducing Lagrange multipliers 𝜶, and

using the KKT theorem of optimization theory, for the vector x, the decision function then becomes

𝑓(𝑥) = 𝑠𝑔𝑛(∑ 𝑦𝑖𝛼𝑖⟨𝑥, 𝑥𝑖⟩ + 𝑏𝑁𝑖=1 ) (21)

By using the inner product ⟨𝑥, 𝑥𝑖⟩ = 𝑥𝑇𝑥𝑖 with kernel function 𝐾(𝑥, 𝑥𝑖), the input data are mapped to a

higher dimensional space. Then, in this higher dimensional space, a separating hyperplane is built to

maximize the margin. In the lower dimensional data space, this hyperplane becomes a non-linear

separating function. More popular kernel methods are widely used in SVMs, like polynomial kernel and

RBF kernel [11].

4.2 Random Forests As an ensemble learning method for classification, regression and other tasks, Random Forests operates

by constructing a multitude of decision tress at training time and outputting the class that is the mode of

the classes or mean prediction of the individual trees. Unlike the decision trees’ habit of overfitting to

their training set, Random Forest can correct these decision trees’ results. Since small changes in the

training data induce a high variance in single classification trees and often lead to rather low classification

accuracies [9]. The idea of Random Forests is to grow multiple decision trees on random subsets of the

training data and related variables. For the classification of previously unseen data, Random Forests take

advantage of the high variance among individual trees, letting each tree vote for the class membership,

and assigning the respective class according to the majority of the votes. Such ensembles demonstrate

robust and accurate performance on complex datasets with little need for fine-tuning and in the presence

of many noisy variables [9].

4.3 K-Nearest Neighbors The K-Nearest Neighbors (KNN) algorithm, in pattern recognition, is a non-parametric method used for

classification and regression. In both cases, the input consists of the k closest training examples in the

feature space. And KNN is a type of instance based learning, where the function is only approximated

locally and all computation is deferred until classification [10].

Both for classification and regression, it can be useful to assign weight to the contributions of the

neighbors, so that the nearer neighbors contribute more to the average than the more distant ones. For

example, a common weighting scheme consists in giving each neighbor a weight of 1/d, where d is the

distance to the neighbor [10].

The neighbors are taken from a set of objects for which the class (for k-NN classification) or the object

property value (for k-NN regression) is known. This can be thought of as the training set for the algorithm,

though no explicit training step is required [10].

A shortcoming of the k-NN algorithm is that it is sensitive to the local structure of the data. The algorithm

has nothing to do with and is not to be confused with k-means, another popular machine learning

technique [10].

5. Experiments In the classification experiments, Support Vector Machines, K-Nearest Neighbors and Random Forest are

used for testing the classification accuracy, based on the 142 positive samples and 695 negative samples.

Table 1 shows the result of the classification based on SVMs. Even though the result seems to be

acceptable by RBF kernel, many positive samples are misclassified. For the number of blocks is 20*20,

there are 42 misclassified positive samples. Different gamma values for RBF kernel are tested (gamma is

equal to 0.1, 0.2, 0.25, 0.4, 0.5, 0.75), best result can be obtained when gamma is equal to 0.5.

Table 1 Classification Results by SVMs (RBF kernel’s gamma = 0.5, Canny Edge Detection’s threshold is [10, 19])

Kernel Methods Clusters Blocks Number Training Accuracy

Linear 8 10*10 22.1027%

RBF 8 10*10 87.0968%

RBF 10 10*10 89.4863%

RBF 20 10*10 85.5436%

RBF 40 10*10 83.2736%

RBF 10 15*15 94.6237%

RBF 10 20*20 94.9821%

RBF 10 25*25 94.9821%

Table 2 Confusion Matrix (SVM with RBF kernel, gamma = 0.5)

Prediction Ground Truth

Nest Non-Nest

Nest (142) 100 42

Non-Nest (695) 0 695

Depending on the classification by SVM with RBF kernel, many nests are misclassified as non-nest class,

as the Table 2 shown. Some misclassified nest are listed in Figure 10.

Figure 10 Misclassified Image

Using Random Forest to make classification, the result is worse than the result of using SVMs, the Table 3

shows the result. By checking the result, we found that most positive samples are misclassified.

Table 3 Classification Result by Random Forest

Clusters Blocks Number Training Accuracy

10 10*10 83.0346%

10 15*15 83.7515%

10 20*20 84.9462%

10 25*25 84.1099%

20 20*20 83.0346%

Instead of SVMs and Random Forest, KNN is an alternative method for most classification tasks. However,

based on the proposed image processing method, it cannot differentiate between the nest image and

non-nest image. In my experiment, if the blocks number is 15*15, all negative samples are misclassified

by KNN when K is equal to 2. When K is equal to 3, all positive samples are misclassified. In the experiment,

we changed K value from 2 to 10, and the result were bad. In light of this, the image processing method,

for this situation, may not satisfy the KNN classifier.

6. Conclusion and Future Work In this study, I used a way to extract image textural features for getting a classification result. Meanwhile,

three kinds of machine learning methods are used for classification. The experiment showed that using

Support Vector Machine with RBF kernel (gamma = 0.5) can get the best classification result (Cluster

number is 10, blocks number is 20*20), the accuracy is 0.949821. However, after checking the result, many

positive images are misclassified. So, how to improve our classification result is still our next step.

Furthermore, based on the popularity of deep learning methods, convolutional neural networks, as a

dominant deep learning method, have achieved a great success in pattern recognition area [2]. How it

performs at bird nest classification is also an interesting area to test.

After getting satisfied classification result, our main problem will shift to how to detect nests in an input

image. There are some studies about object detection methods in other areas, like face detection, vehicle

detection, etc. These studies can provide us with good thoughts and methodologies for the bird nest

detection task.

Reference [1] Ahonen, Timo, Abdenour Hadid, and Matti Pietikainen. "Face description with local binary patterns:

Application to face recognition." Pattern Analysis and Machine Intelligence, IEEE Transactions on 28.12

(2006): 2037-2041.

[2] Farfade, Sachin Sudhakar, Mohammad Saberian, and Li-Jia Li. "Multi-view Face Detection Using Deep

Convolutional Neural Networks." arXiv preprint arXiv:1502.02766 (2015).

[3] Shan, Caifeng, Shaogang Gong, and Peter W. McOwan. "Robust facial expression recognition using

local binary patterns." Image Processing, 2005. ICIP 2005. IEEE International Conference on. Vol. 2. IEEE,

2005.

[4] Heikkilä, Marko, Matti Pietikäinen, and Cordelia Schmid. "Description of interest regions with local

binary patterns." Pattern recognition 42.3 (2009): 425-436.

[5] Zhang, Wenchao, et al. "Local gabor binary pattern histogram sequence (lgbphs): A novel non-

statistical model for face representation and recognition." Computer Vision, 2005. ICCV 2005. Tenth IEEE

International Conference on. Vol. 1. IEEE, 2005.

[6] Bao, Paul, D. Zhang, and Xiaolin Wu. "Canny edge detection enhancement by scale multiplication."

Pattern Analysis and Machine Intelligence, IEEE Transactions on 27.9 (2005): 1485-1490.

[7] Maini, Raman, and Himanshu Aggarwal. "Study and comparison of various image edge detection

techniques." International journal of image processing (IJIP) 3.1 (2009): 1-11.

[8] Harris, Chris, and Mike Stephens. "A combined corner and edge detector." Alvey vision conference.

Vol. 15. 1988.

[9] Stumpf, André, and Norman Kerle. "Object-oriented mapping of landslides using Random Forests."

Remote Sensing of Environment 115.10 (2011): 2564-2577.

[10] Altman, Naomi S. "An introduction to kernel and nearest-neighbor nonparametric regression." The

American Statistician 46.3 (1992): 175-185.

[11] Richard O. Duda, Peter E. Hart, David G. Stork, Pattern Classification, 2nd Edition, Wiley, 2000, ISBN

978-0-471-05669-0.

Bird Nest Images Classification Based on Canny Edge Detection and Local Binary Pattern · 2017-11-30 · Bird Nest Images Classification Based on Canny Edge Detection and Local Binary

Documents