Page 1
Computer aided plant identification system Ngoc-Hai Pham
International Research Institute
MICA – UMI2954
Hanoi University of Science and Technology
[email protected]
Thi-Lan Le International Research
Institute MICA – UMI2954
Hanoi University of Science and Technology
[email protected]
Pierre Grard Agricultural Research
for Development -
CIRAD [email protected]
Van-Ngoc Nguyen International Research Institute
MICA – UMI 2954
Hanoi University of Science and Technology
[email protected]
Abstract— Human pressure on the environment has steadily
increased throughout the last few decades, especially in Southeast Asia. This part of the world is also particularly rich in biodiversity and an appropriate biological knowledge is in high demand by the civil society which is actually asking for a “greener” life and the questions of quality of life are rising more and more. Unfortunately, plants recognition/identification skills of people in these countries are still limited. However, this skill can be improved with the aid of information technology. Recently, a number of works have been dedicated to plant information collection, plant information management and plant recognition. However, the developed systems are far from user requirement. In this paper, we present our computer aided plant identification system. The developed system contains two main parts: a semi-automatic graphical tool and an automatic plant identification method based on leaf images. Feedbacks from end users for graphical tool show that this tool does not have disadvantage of the classical tools (too technical, not sufficiently precise) while the experimental results of automatic plant identification method based on leaf image analysis are promising.
Keywords- plant identification; leaf identification
I. INTRODUCTION
Human pressure on the environment has steadily increased
throughout the last few decades, especially in Southeast Asia.
This part of the world is also particularly rich in biodiversity
and an appropriate biological knowledge is in high demand by
the civil society which is actually asking for a “greener” life
and the questions of quality of life are rising more and more.
Unfortunately, plants recognition skills of people in these
countries are still limited. However, we believe that this skill
can be improved with the aid of information technology.
Recently, a number of works have been dedicated to plant information collection, plant information management and
plant recognition. However, the developed systems are far
from user requirement.
In this paper, we present our system for plant identification
from semi-automatic graphical tool to an automatic plant
identification method. The developed graphical tool uses
graphical icons of plant features. It allows users to choose
freely plant features. It allows missing information.
The rest of this paper is organized as follows. In section 2,
describe in details our plant identification system.
Implementation and experimental results are given in section 3.
We give some discussion in section 4. Finally, conclusions and
future works are introduced in section 5.
II. COMPUTER AIDED PLANT IDENTIFICATION SYSTEM
A. General description
Based on the wish that the end user of the plant identification systems is general public, the developed system
should be intuitive, easy to use, especially for non-botanists.
The difficulty encountered by non-botanists when identifying
plants using classical tools such as floras or handbooks (too
technical, incomplete specimens, not sufficiently precise) led
us to develop new plant identification system. In the following
sections, we describe our plant identification system consisting
of two parts: graphical tool and automatic plant identification.
B. Graphical tool for plant identification
The main idea when conceiving the graphical tool is that this
tool has to address the four main problems encountered with
the classical tools (the classical tools uses technical jargon;
does not allow users to choose freely features, does not allow
information missing, does not tolerate errors). Concerning to
the first problem, we design graphical icons corresponding to
jargons used by botanists. The graphical icons are expressive
and language-independent. The second problem is resolved by
a graphical interface that allows users to choose as many as features they want. The third and the fourth problems will be
removed by an efficient identification method and result
presenting method.
Architecture of the graphical tool for plant identification is
shown in Fig.1. This tool has three main components:
graphical interface, plant identification and result interface.
The graphical interface presents characteristics of different
parts of plants such as leaf, venation, and bark under graphical
icons. The designed graphical icons represent well jargons
used by botanists. Figure 2 illustrates icons for 4 types of
margin of the lamina with the corresponding technical terms.
As we can see, with graphical interface, non-botanists may
describe easily the characteristics of the interested plants.
Plant identification process aims at computing the similarity
between plants in the database and the user query. In our work,
each plant is represented by a vector of presence of defined
features. If the plant has a feature, the value of vector element
corresponding to this feature is set to 1. Otherwise, it is set to 0.
Figure 3a illustrates several plants in the database. For
example, Plan 1 has ‘entire’ margin of the lamina and is a ‘tree’. In the similar way, the query is represented as a vector
of defined features. In this vector, the corresponding elements
of the chosen features are set to 1. An example of user query is
shown in Fig.3c. Because features may play different roles in
978-1-4673-2088-7/13/$31.00 ©2013 IEEE 134
Page 2
plant identification, in our work, we represent that by a weight
vector. The value of this vector is defined by the botanists. An
example of weight vector is given in Fig.3b.
Figure 1. Architecture of the graphical tool consisting of three main
components: graphical interface, plant identification and result interface.
Entire Undulate Dentate Crenate
Figure 2. Margin of the lamina feature icons and the corresponding technical
terms
The similarity (S) between one plant and a query is defined
as below:
*
100%i i
i
i
i
w A
Sw
= ×∑
∑ (1)
Where Ai is defined by AND operator between the ith
element of vector of plant and that of the query. The similarity
of plants (c.f. Fig. 3a) and the query (c.f. Fig. 3c) calculated by
Eq. 1 is shown in Tab. 1. The similarity calculation permits
information missing.
The result interface allows to present the identification
results to user. Plants in the database are sorted by decreasing
order of similarity with the query. The result presentation way
tolerates errors in feature selection step. Moreover, for each
plant in result list, users can see its images, botanical drawings
and descriptive information such as distribution, control
method.
The readers are invited to read for more description of this
graphical tool [1].
Figure3. (a) Plants in the database; (b) The corresponding weight for each
plant characteristics; (c) The sample created from a list of characteristics
provided by users.
TABLE I. SIMILARITY BETWEEN THE SAMPLE DEFINED IN FIG. 3B AND THE
PLANTS IN THE DATABASE PRESENTED IN FIG. 3A.
Plants S
Sp. 1 0
Sp. 2 100
Sp. 3 0
... …
Sp. N 37.5
C. Toward automatic plant identification method
As we discuss in the previous section, the proposed
graphical tool can help the users in identification process.
However, the feature selecting process is still manual. The
users are not always willing to provide sufficient features for
the system. Therefore, the identification result can be
inaccurate.
Today, current achievements in image processing and
computer vision allow us to believe that the current issues in
our graphical tool can be addressed. We can take advantage of
image processing for extracting characteristics from the plants
we want to identify. This exploratory idea will bring together researchers from different disciplines (signal processing
specialists, specialists in interactive embedded system and
botanists).
There are two main possibilities to move from the graphical
tool to automatic plant identification. The first possibility aims
at learning plant characteristics from image features by
applying machine learning techniques. The second one tries to
identify directly plants from images. The work presented in this
section belongs to the latter one. As we know, plants
differentiate one from the other by their components such as
leaf, root, flower, etc. Among these components, leaf
characteristic plays an important role in plant identification.
Therefore, we try to exploit leaf information for plant
identification.
135
Page 3
Up to now, many works have been done for plant
identification based on leaf image analysis. However, we do
not try to have an exhaustive survey of related works. There are
many approaches to investigate leaf features with implicit
features and explicit ones. The work presented in [2] calculates
several leaf such as compactness, roundness, elongation, roughness. Qing-Ping Wang et al. [3] combine Hu invariant
moments, which normalized with respect to changes in scale.
Rahmadhani et al. [4] use Hough transform and Fourier
descriptor. Some other teams choose venation features or
texture features to identify leaves. Yunyoung et al. [5]
implemented with leaf arrangement and venation
representation to decide the type of leaf. They construct a
weighted graph G for the leaf venation. After that, they analyze
the leaf arrangement (ex. alternate, opposite) and feather
venation (ex. pinnately venationed). Rahmadhani et al. [4] also
use b-spline to analysis venation features. Hanife Kebapci [6]
employs Gabor wavelets to extract plant texture.
Among image descriptors, histogram of oriented gradients
(HOG) has been proved robust for object detection and object
identification. While working with leaf image, HOG allows to represent shape and appearance of leaf [7]. In our work, we
propose to employ HOG for plant identification based on leaf
information. The plant identification method based on HOG
consists of three steps. Firstly, HOG is computed for all images
in the database. Since, a large number of HOG descriptors are
computed in an image. In the second step, we propose to use
Maximum Margin Criterion (MMC) to reduce the descriptor
dimension. Finally, for leaf identification, we apply SVM
(Support Vector Machine).
HOG descriptor is proposed in [8]. In order to compute
HOG, firstly image is divided into squares cells and blocks.
Then, the gradient and its direction of pixels in cells are
calculated. After this step, a histogram is created. Each bin of
this histogram contains the number of pixels in the same
direction. The histogram of each block is created by
accumulating histogram of its cells. Finally, all histogram are
arranged to from HOG descriptor of an image.
Since the dimension of HOG descriptor is relatively high
(with an image of 1600*1200 of resolution, the size of cell is
8*8 pixels and the size of block is 2×2 cells, HOG descriptor
dimension is 5940) and not all element of HOG descriptor is
relevant for leaf representation, before applying classification
method, we have to reduce the dimension of HOG. In this
paper, we propose to use MMC since it’s efficient and robust.
Figure 4. HOG computed from image of leaf
In this paper, we compare the performance of HOG and Hu
[9] descriptor for plant recognition based on leaf image. Hu
descriptor consists of 7 moment features that allow to describe
shapes. These features are invariant to rotation, translation and
scaling.
Concerning the classification method, SVM was selected for
classification in our research due to high accuracy and ability
to work with high dimensional data, ability to generate non-
linear and well as high dimensional classifier.
Let xi , yi
P Q
, i = 1, …,l, p , yi 2 @ 1, 1R S
, xi 2 Rd
be the
training data with labels y. The support vector machine (SVM)
using C-Support Vector Classification (C-SVC) algorithm will
find the optimal hyper-plane:
f x` a
= wT Φ x` a
+ b (1)
to separate the training data by solving the following
optimization problem:
min1
2
ffffwN
N
N
N
2+ CX
i = 1
n
ξi (2)
subject to
yi wT Φ xi
` a
+ bB C
≥ 1@ξi and ξ
i≥0, i =1,…,l (3)
The optimization problem (2) will guarantee to maximize
the hyper-plane margin while minimize the cost of error.
ξi, i = 1, …,l are non-negative slack variables introduced to
relax the constraints of separable data problem to the constraint
of non-separable data problem. For an error to occur the
corresponding ξ
i must exceed unity (3), so Xiξ
i is an upper
bound on the number of training errors. Hence an extra cost
CXiξ
i for errors is added to the objective function (2) where
C is a parameter chosen by the user.
The Lagrangian formulation of the primal problem is:
L p =1
2
ffffwN
N
N
N
2+ CX
i
ξi@X
i
αi yi xiT w + b
b c
@1 + ξi
T U
@Xi
µiξ
i
(4)
We will need the Karush-Kuhn-Tucker conditions for the
primal problem to attain the dual problem:
LD
=Xi
αi@1
2
ffffXi,j
αi α j yi y j Φ xi
` aTΦ xi
` a
(5)
Subject to:
136
Page 4
0 ≤αi ≤C and Xi
αi yi = 0 (6)
The solution is given by:
w =Xi
NS
αi yi xi
(7)
Where NS is the number of support vectors.
Note that data only appear in the training problem (4) and
(5) in the form of dot product Φ xi
` aTΦ xi
` a
and can be
replaced by any kernel K with K xi ,x j
` a
= Φ xi
` aTΦ x j
` a
is a
mapping to map the data to some other (possibly infinite
dimensional) Euclidean space. One example is Radial Basis
Function (RBF) kernel K xi ,x j
` a
= e@ y x i@ x j
N
N
N
N
N
N
2
In test phase an SVM is used by computing the sign of
f x` a
=Xi
NS
αi yi Φ si
` aTΦ x` a
+ b =Xi
NS
αi yi K si ,x` a
+ b (8)
where the si are the support vectors.
III. EXPERIMENTAL RESULTS
We developed graphical tool in several versions. The first version is developed in Php language and mySQL server. The
strength of phpscript combined with database mySQL makes
this tool run fast and smoothly. It is useful and convenient
because users do not have to install anything. All they need is a
computer with Internet connection. The second version runs on
mobile devices. This version was built on iOS (e.g. iPhone,
iPad). For this version, we use the language objective C (Xcode
tool) and database Sqlite. This version run very fast and after
installing this application, we can use our compact device to
identify plants everywhere and do not care about the internet
signals.
Figure 5: Graphical tool for Ipad
The developed graphical tool is used by both botanists and
non botanists (e.g. forester).
Concerning to automatic plant identification based on HOG
descriptor, we test our method with 32 different plants of
Flavia data set [10] (c.f. Fig 6). We choose leaves with many
different types of shape and venation. Each type of leaf, we
pick 10 samples in test set and the others belong to train set.
Figure 6: Leaf samples in test database For each image, a HOG descriptor vector is extracted. Cell
size is 8*8 pixels and block size is 2×2 cells. The number of
bins is set to 9. As a result, there are total 64 (8*8) cells and
165 (15*11) blocks, so the dimension of overall HOG feature
is 5940 (9*2*2*165). Then we reduce the dimension of
features to 100 using MMC because our experiment shows that
only 100 first eigenvalues are significant. We try to analyze the
effect of data scaling before applying SVM. With data scaling,
the accuracy of our method is 84.7% while without data
scaling, our method obtains 76.25% of accuracy. The
experimental results show that many leaves with different
shapes are correctly identified. However, several classes are
not well recognized. Figure 7 illustrates two cases when the leaf of a class is misclassified in another class. The reason of
this problem is that, HOG (implicit feature) is computed over
an image; therefore, local information may be lost.
Figure 6: Example of bad identification results when using HOG as
descriptor
In order to compare the recognition performance of HOG
and Hu, we compute 7 moments of Hu and apply SVM. The
training and the testing dataset are the same at the experiment
with HOG descriptor.
TABLE II. RESULT OF TWO METHODS
Species
no.
Number of incorrect
recognition with HOG
descriptor
Number of incorrect
recognition with Hu
descriptor
1 6 9
2 1 6
3 0 10
4 0 3
5 0 8
6 0 5
7 1 7
8 4 8
9 3 10
10 2 10
11 0 7
137
Page 5
12 0 5
13 3 9
14 0 10
15 1 9
16 3 10
17 0 0
18 0 6
19 1 8
20 2 5
21 0 1
22 2 9
23 4 9
24 2 9
25 2 10
26 0 10
27 1 10
28 3 9
29 0 7
30 5 2
31 3 9
32 0 9
Figure 7: Example of bad identification results when using Hu as descriptor
TABLE III. SUMMARY OF RESULTS
Method Average accuracy
SVM+HOG without scale +MMC 76.25%
SVM+HOG+with scale 84.6875%
Hu+SVM with scale 23.125%
Hu+SVM without scale 25.3125%
The obtained results with 32 species are detailed Tab. II while
the summary of two experiments is presented in Tab. III. With
this database, the accuracy of HOG is much higher than that of
Hu. As you can see in Fig.7 and Tab.I, Hu descriptor is not
robust when working with normal shape leaf. We should
combine Hu descriptor with the features which is able to
describe vein of leaf.
IV. DISCUSSIONS
As we present in the previous sections, graphical tool and
automatic identification method developed in our system allow
users to identify a plant of interest. However, this framework
needs to be improved by the following direction.
The first direction is that, we need to improve leaf
identification accuracy of automatic method. Our experimental
results show that plant identification based on leaf information
has 80% accuracy with the Flavia dataset. However, images in
this dataset are taken in strictly controlled conditions (one
mature leaf in an image). In the real application, users can take
an image of leaf in different conditions. The plant identification
methods based on leaf information have to take into account
these conditions.
Taking into account that, a plant can be identified based on
several components (e.g. bark, flowers, and seeds). The second
direction concerns to multiple features/modalities plant
identification.
The third direction is to build plant identification system for
smartphone. According to the specialized press, the graphics
tablet and smartphone market is one of the fastest growing
markets nowadays. Apple published recently (January 2011)
that the number of dedicated applications for iPad is reaching
60.000, with a growing rate of 300 a day. Regarding the iPhone, it has surpassed the 340 000. The common user of a
graphic tablet is an urban one and uses it during two hours per
day, mostly for hobby. Main applications used by the general
public are interactive games, multimedia, e-reading, but it
seems that access to knowledge is an important factor in the
choice of buying a graphics tablet. Among the key factors of
success in the future of these new mobile devices, we can note
that the size of the screen and intuitive touch interface are
determining the choice. In addition to their big size and high
resolution screen, most Smartphone and new graphics tablets
are now equipped with integrated sensors such as a camera, accelerometers, compass and a GPS antenna. To the best of our
knowledge, there is very few plant identification applications
developed for smartphone. The application, called Leafsnap, is
one of plant identification applications for Iphone/Ipad. This
application requires users to provide a leaf image of plant of
interest. The system returns to users plants name, plant
photographs and information on the flowers, fruit, seeds and
bark. However, the plant identification accuracy of this
application is still limited. In order to identify a plant, users
have to take image of leaf on a white background.
V. CONCLUSIONS AND FUTURE WORKS
In this paper, we present a plant identification system
consisting of two main components: graphical tool and
automatic plant identification method. Concerning to graphical
tool, this graphical tool has been developed in various versions.
Feedbacks from end users for graphical tool show that this tool
is more effective than the classical tools because it uses
graphical icons instead of technical terms. It tolerates the
missing information. Experimental results of the automatic
plant identification method on Flavia dataset are promising.
However, this data set is relatively simple: one mature leaf per
image with uniform background. In real application, we have
to work with more difficult images (containing multiple leafs, with complex background, in different lighting conditions).
The automatic plant identification method has to take into
account these factors.
PERSPECTIVES
The research leading to this paper was supported by the
National Project B2011-01-05 "Study and develop an object
detection and recognition system in smart and perceptive
environments”. We would like to thank the project and people
involved in this project.
138
Page 6
REFERENCE
1. BONNET, P., M. ARBONNIER, and P. GRARD. A
graphical tool for the identification of West African
savannas trees. in Smithsonian Botanical Symposium
2005 - "the future of floras: new frameworks, new
technologies, new uses". 2005.
2. Ta-Te Lin, Y.-T.C., Wen-Chi Liao. Leaf boundary
extraction and geometric modeling of vegetable
seedlings. in 2000 ASAE Annual International
Meeting. 2000. Milwaukee, Wisconsin, USA.
3. Qing-Ping Wang, J.-X.D., Chuan-Min Zhai,
Recognition of Leaf Image Based on Ring Projection
Wavelet Fractal Feature, in 6th International
Conference on Intelligent Computing, ICIC 2010.
2010, Springer: Changsha, China. p. 240-246.
4. Rahmadhani M., Y.H. Shape and Vein Extraction on
Plant Leaf Images Using Fourier and B-Spline
Modeling. in AFITA 2010 International Conference.
2010. Bogor Indonesia: IPB(Bogor Agricultural
University).
5. Yunyoung Nam, E.H.b., Dongyoon Kim, A similarity-
based leaf image retrieval scheme: Joining shape and
venation features. Computer Vision and Image
Understanding, 2008. 110(2): p. 245-259.
6. Hanife Kebapci, B.Y., Gozde Unal, Plant Image
Retrieval Using Color, Shape and Texture Features.
Oxford University Press on behalf of The British
Computer Society, 2010. 7. Xiao, X.-Y., et al., HOG-Based Approach for Leaf
Classification Advanced Intelligent Computing
Theories and Applications. With Aspects of Artificial
Intelligence. 2010, Springer Berlin / Heidelberg. p.
149-155.
8. Navneet Dalal , B.T. Histograms of Oriented
Gradients for Human Detection. in Computer Vision
and Pattern Recognition, 2005. 2005. San Diego, CA,
USA.
9. Hu, M.-K., Visual pattern recognition by moment
invariants IRE Transactions on Information Theory,
1962. 8(2): p. 179-187. 10. Stephen Gang Wu, F.S.B., Eric You Xu, Yu-Xuan
Wang, Yi-Fan Chang, Qiao-Liang Xiang, A Leaf
Recognition Algorithm for Plant Classification Using
Probabilistic Neural Network, in Signal Processing
and Information Technology, 2007 IEEE
International Symposium. 2007: Giza p. 11-16.
139