Computer aided plant identification system

Computer aided plant identification system Ngoc-Hai Pham

International Research Institute

MICA – UMI2954

Hanoi University of Science and Technology

[email protected]

Thi-Lan Le International Research

Institute MICA – UMI2954


[email protected]

Pierre Grard Agricultural Research

for Development -

CIRAD [email protected]

Van-Ngoc Nguyen International Research Institute

MICA – UMI 2954


[email protected]

Abstract— Human pressure on the environment has steadily

increased throughout the last few decades, especially in Southeast Asia. This part of the world is also particularly rich in biodiversity and an appropriate biological knowledge is in high demand by the civil society which is actually asking for a “greener” life and the questions of quality of life are rising more and more. Unfortunately, plants recognition/identification skills of people in these countries are still limited. However, this skill can be improved with the aid of information technology. Recently, a number of works have been dedicated to plant information collection, plant information management and plant recognition. However, the developed systems are far from user requirement. In this paper, we present our computer aided plant identification system. The developed system contains two main parts: a semi-automatic graphical tool and an automatic plant identification method based on leaf images. Feedbacks from end users for graphical tool show that this tool does not have disadvantage of the classical tools (too technical, not sufficiently precise) while the experimental results of automatic plant identification method based on leaf image analysis are promising.

Keywords- plant identification; leaf identification

I. INTRODUCTION

Human pressure on the environment has steadily increased

throughout the last few decades, especially in Southeast Asia.

This part of the world is also particularly rich in biodiversity

and an appropriate biological knowledge is in high demand by

the civil society which is actually asking for a “greener” life

and the questions of quality of life are rising more and more.

Unfortunately, plants recognition skills of people in these

countries are still limited. However, we believe that this skill

can be improved with the aid of information technology.

Recently, a number of works have been dedicated to plant information collection, plant information management and

plant recognition. However, the developed systems are far

from user requirement.

In this paper, we present our system for plant identification

from semi-automatic graphical tool to an automatic plant

identification method. The developed graphical tool uses

graphical icons of plant features. It allows users to choose

freely plant features. It allows missing information.

The rest of this paper is organized as follows. In section 2,

describe in details our plant identification system.

Implementation and experimental results are given in section 3.

We give some discussion in section 4. Finally, conclusions and

future works are introduced in section 5.

II. COMPUTER AIDED PLANT IDENTIFICATION SYSTEM

A. General description

Based on the wish that the end user of the plant identification systems is general public, the developed system

should be intuitive, easy to use, especially for non-botanists.

The difficulty encountered by non-botanists when identifying

plants using classical tools such as floras or handbooks (too

technical, incomplete specimens, not sufficiently precise) led

us to develop new plant identification system. In the following

sections, we describe our plant identification system consisting

of two parts: graphical tool and automatic plant identification.

B. Graphical tool for plant identification

The main idea when conceiving the graphical tool is that this

tool has to address the four main problems encountered with

the classical tools (the classical tools uses technical jargon;

does not allow users to choose freely features, does not allow

information missing, does not tolerate errors). Concerning to

the first problem, we design graphical icons corresponding to

jargons used by botanists. The graphical icons are expressive

and language-independent. The second problem is resolved by

a graphical interface that allows users to choose as many as features they want. The third and the fourth problems will be

removed by an efficient identification method and result

presenting method.

Architecture of the graphical tool for plant identification is

shown in Fig.1. This tool has three main components:

graphical interface, plant identification and result interface.

The graphical interface presents characteristics of different

parts of plants such as leaf, venation, and bark under graphical

icons. The designed graphical icons represent well jargons

used by botanists. Figure 2 illustrates icons for 4 types of

margin of the lamina with the corresponding technical terms.

As we can see, with graphical interface, non-botanists may

describe easily the characteristics of the interested plants.

Plant identification process aims at computing the similarity

between plants in the database and the user query. In our work,

each plant is represented by a vector of presence of defined

features. If the plant has a feature, the value of vector element

corresponding to this feature is set to 1. Otherwise, it is set to 0.

Figure 3a illustrates several plants in the database. For

example, Plan 1 has ‘entire’ margin of the lamina and is a ‘tree’. In the similar way, the query is represented as a vector

of defined features. In this vector, the corresponding elements

of the chosen features are set to 1. An example of user query is

shown in Fig.3c. Because features may play different roles in

978-1-4673-2088-7/13/$31.00 ©2013 IEEE 134

plant identification, in our work, we represent that by a weight

vector. The value of this vector is defined by the botanists. An

example of weight vector is given in Fig.3b.

Figure 1. Architecture of the graphical tool consisting of three main

components: graphical interface, plant identification and result interface.

Entire Undulate Dentate Crenate

Figure 2. Margin of the lamina feature icons and the corresponding technical

terms

The similarity (S) between one plant and a query is defined

as below:

*

100%i i

i

i

i

w A

Sw

= ×∑

∑ (1)

Where Ai is defined by AND operator between the ith

element of vector of plant and that of the query. The similarity

of plants (c.f. Fig. 3a) and the query (c.f. Fig. 3c) calculated by

Eq. 1 is shown in Tab. 1. The similarity calculation permits

information missing.

The result interface allows to present the identification

results to user. Plants in the database are sorted by decreasing

order of similarity with the query. The result presentation way

tolerates errors in feature selection step. Moreover, for each

plant in result list, users can see its images, botanical drawings

and descriptive information such as distribution, control

method.

The readers are invited to read for more description of this

graphical tool [1].

Figure3. (a) Plants in the database; (b) The corresponding weight for each

plant characteristics; (c) The sample created from a list of characteristics

provided by users.

TABLE I. SIMILARITY BETWEEN THE SAMPLE DEFINED IN FIG. 3B AND THE

PLANTS IN THE DATABASE PRESENTED IN FIG. 3A.

Plants S

Sp. 1 0

Sp. 2 100

Sp. 3 0

... …

Sp. N 37.5

C. Toward automatic plant identification method

As we discuss in the previous section, the proposed

graphical tool can help the users in identification process.

However, the feature selecting process is still manual. The

users are not always willing to provide sufficient features for

the system. Therefore, the identification result can be

inaccurate.

Today, current achievements in image processing and

computer vision allow us to believe that the current issues in

our graphical tool can be addressed. We can take advantage of

image processing for extracting characteristics from the plants

we want to identify. This exploratory idea will bring together researchers from different disciplines (signal processing

specialists, specialists in interactive embedded system and

botanists).

There are two main possibilities to move from the graphical

tool to automatic plant identification. The first possibility aims

at learning plant characteristics from image features by

applying machine learning techniques. The second one tries to

identify directly plants from images. The work presented in this

section belongs to the latter one. As we know, plants

differentiate one from the other by their components such as

leaf, root, flower, etc. Among these components, leaf

characteristic plays an important role in plant identification.

Therefore, we try to exploit leaf information for plant

identification.

135

Up to now, many works have been done for plant

identification based on leaf image analysis. However, we do

not try to have an exhaustive survey of related works. There are

many approaches to investigate leaf features with implicit

features and explicit ones. The work presented in [2] calculates

several leaf such as compactness, roundness, elongation, roughness. Qing-Ping Wang et al. [3] combine Hu invariant

moments, which normalized with respect to changes in scale.

Rahmadhani et al. [4] use Hough transform and Fourier

descriptor. Some other teams choose venation features or

texture features to identify leaves. Yunyoung et al. [5]

implemented with leaf arrangement and venation

representation to decide the type of leaf. They construct a

weighted graph G for the leaf venation. After that, they analyze

the leaf arrangement (ex. alternate, opposite) and feather

venation (ex. pinnately venationed). Rahmadhani et al. [4] also

use b-spline to analysis venation features. Hanife Kebapci [6]

employs Gabor wavelets to extract plant texture.

Among image descriptors, histogram of oriented gradients

(HOG) has been proved robust for object detection and object

identification. While working with leaf image, HOG allows to represent shape and appearance of leaf [7]. In our work, we

propose to employ HOG for plant identification based on leaf

information. The plant identification method based on HOG

consists of three steps. Firstly, HOG is computed for all images

in the database. Since, a large number of HOG descriptors are

computed in an image. In the second step, we propose to use

Maximum Margin Criterion (MMC) to reduce the descriptor

dimension. Finally, for leaf identification, we apply SVM

(Support Vector Machine).

HOG descriptor is proposed in [8]. In order to compute

HOG, firstly image is divided into squares cells and blocks.

Then, the gradient and its direction of pixels in cells are

calculated. After this step, a histogram is created. Each bin of

this histogram contains the number of pixels in the same

direction. The histogram of each block is created by

accumulating histogram of its cells. Finally, all histogram are

arranged to from HOG descriptor of an image.

Since the dimension of HOG descriptor is relatively high

(with an image of 1600*1200 of resolution, the size of cell is

8*8 pixels and the size of block is 2×2 cells, HOG descriptor

dimension is 5940) and not all element of HOG descriptor is

relevant for leaf representation, before applying classification

method, we have to reduce the dimension of HOG. In this

paper, we propose to use MMC since it’s efficient and robust.

Figure 4. HOG computed from image of leaf

In this paper, we compare the performance of HOG and Hu

[9] descriptor for plant recognition based on leaf image. Hu

descriptor consists of 7 moment features that allow to describe

shapes. These features are invariant to rotation, translation and

scaling.

Concerning the classification method, SVM was selected for

classification in our research due to high accuracy and ability

to work with high dimensional data, ability to generate non-

linear and well as high dimensional classifier.

Let xi , yi

P Q

, i = 1, …,l, p , yi 2 @ 1, 1R S

, xi 2 Rd

be the

training data with labels y. The support vector machine (SVM)

using C-Support Vector Classification (C-SVC) algorithm will

find the optimal hyper-plane:

f x` a

= wT Φ x` a

+ b (1)

to separate the training data by solving the following

optimization problem:

min1

2

ffffwN

N

N

N

2+ CX

i = 1

n

ξi (2)

subject to

yi wT Φ xi

` a

+ bB C

≥ 1@ξi and ξ

i≥0, i =1,…,l (3)

The optimization problem (2) will guarantee to maximize

the hyper-plane margin while minimize the cost of error.

ξi, i = 1, …,l are non-negative slack variables introduced to

relax the constraints of separable data problem to the constraint

of non-separable data problem. For an error to occur the

corresponding ξ

i must exceed unity (3), so Xiξ

i is an upper

bound on the number of training errors. Hence an extra cost

CXiξ

i for errors is added to the objective function (2) where

C is a parameter chosen by the user.

The Lagrangian formulation of the primal problem is:

L p =1

2

ffffwN

N

N

N

2+ CX

i

ξi@X

i

αi yi xiT w + b

b c

@1 + ξi

T U

@Xi

µiξ

i

(4)

We will need the Karush-Kuhn-Tucker conditions for the

primal problem to attain the dual problem:

LD

=Xi

αi@1

2

ffffXi,j

αi α j yi y j Φ xi

` aTΦ xi

` a

(5)

Subject to:

136

https://www.researchgate.net/publication/220778065_Recognition_of_Leaf_Image_Based_on_Ring_Projection_Wavelet_Fractal_Feature?el=1_x_8&enrichId=rgreq-376f230fe3f7f8d8836af193b25f43ee-XXX&enrichSource=Y292ZXJQYWdlOzI2MTEyMDM2ODtBUzoxMjE2NDkyNjUzODU0NzJAMTQwNjAxNDg1NDAwNw==

https://www.researchgate.net/publication/222237825_A_similarity-based_leaf_image_retrieval_scheme_joining_shape_and_venation_features_Comput_Vis_Image_Underst?el=1_x_8&enrichId=rgreq-376f230fe3f7f8d8836af193b25f43ee-XXX&enrichSource=Y292ZXJQYWdlOzI2MTEyMDM2ODtBUzoxMjE2NDkyNjUzODU0NzJAMTQwNjAxNDg1NDAwNw==

https://www.researchgate.net/publication/3489274_Hu_MK_Visual_Pattern_Recognition_by_Moment_Invariants_IRE_Transaction_of_Information_Theory_IT-8?el=1_x_8&enrichId=rgreq-376f230fe3f7f8d8836af193b25f43ee-XXX&enrichSource=Y292ZXJQYWdlOzI2MTEyMDM2ODtBUzoxMjE2NDkyNjUzODU0NzJAMTQwNjAxNDg1NDAwNw==

https://www.researchgate.net/publication/236593664_Shape_and_Vein_Extraction_on_Plant_Leaf_Images_Using_Fourier_and_B-Spline_Modeling?el=1_x_8&enrichId=rgreq-376f230fe3f7f8d8836af193b25f43ee-XXX&enrichSource=Y292ZXJQYWdlOzI2MTEyMDM2ODtBUzoxMjE2NDkyNjUzODU0NzJAMTQwNjAxNDg1NDAwNw==


https://www.researchgate.net/publication/281327886_Histograms_of_Oriented_Gradients_for_Human_Detection?el=1_x_8&enrichId=rgreq-376f230fe3f7f8d8836af193b25f43ee-XXX&enrichSource=Y292ZXJQYWdlOzI2MTEyMDM2ODtBUzoxMjE2NDkyNjUzODU0NzJAMTQwNjAxNDg1NDAwNw==

0 ≤αi ≤C and Xi

αi yi = 0 (6)

The solution is given by:

w =Xi

NS

αi yi xi

(7)

Where NS is the number of support vectors.

Note that data only appear in the training problem (4) and

(5) in the form of dot product Φ xi

` aTΦ xi

` a

and can be

replaced by any kernel K with K xi ,x j

` a

= Φ xi

` aTΦ x j

` a

is a

mapping to map the data to some other (possibly infinite

dimensional) Euclidean space. One example is Radial Basis

Function (RBF) kernel K xi ,x j

` a

= e@ y x i@ x j

N

N

N

N

N

N

2

In test phase an SVM is used by computing the sign of

f x` a

=Xi

NS

αi yi Φ si

` aTΦ x` a

+ b =Xi

NS

αi yi K si ,x` a

+ b (8)

where the si are the support vectors.

III. EXPERIMENTAL RESULTS

We developed graphical tool in several versions. The first version is developed in Php language and mySQL server. The

strength of phpscript combined with database mySQL makes

this tool run fast and smoothly. It is useful and convenient

because users do not have to install anything. All they need is a

computer with Internet connection. The second version runs on

mobile devices. This version was built on iOS (e.g. iPhone,

iPad). For this version, we use the language objective C (Xcode

tool) and database Sqlite. This version run very fast and after

installing this application, we can use our compact device to

identify plants everywhere and do not care about the internet

signals.

Figure 5: Graphical tool for Ipad

The developed graphical tool is used by both botanists and

non botanists (e.g. forester).

Concerning to automatic plant identification based on HOG

descriptor, we test our method with 32 different plants of

Flavia data set [10] (c.f. Fig 6). We choose leaves with many

different types of shape and venation. Each type of leaf, we

pick 10 samples in test set and the others belong to train set.

Figure 6: Leaf samples in test database For each image, a HOG descriptor vector is extracted. Cell

size is 8*8 pixels and block size is 2×2 cells. The number of

bins is set to 9. As a result, there are total 64 (8*8) cells and

165 (15*11) blocks, so the dimension of overall HOG feature

is 5940 (9*2*2*165). Then we reduce the dimension of

features to 100 using MMC because our experiment shows that

only 100 first eigenvalues are significant. We try to analyze the

effect of data scaling before applying SVM. With data scaling,

the accuracy of our method is 84.7% while without data

scaling, our method obtains 76.25% of accuracy. The

experimental results show that many leaves with different

shapes are correctly identified. However, several classes are

not well recognized. Figure 7 illustrates two cases when the leaf of a class is misclassified in another class. The reason of

this problem is that, HOG (implicit feature) is computed over

an image; therefore, local information may be lost.

Figure 6: Example of bad identification results when using HOG as

descriptor

In order to compare the recognition performance of HOG

and Hu, we compute 7 moments of Hu and apply SVM. The

training and the testing dataset are the same at the experiment

with HOG descriptor.

TABLE II. RESULT OF TWO METHODS

Species

no.

Number of incorrect

recognition with HOG

descriptor

Number of incorrect

recognition with Hu

descriptor

1 6 9

2 1 6

3 0 10

4 0 3

5 0 8

6 0 5

7 1 7

8 4 8

9 3 10

10 2 10

11 0 7

137

https://www.researchgate.net/publication/1757884_A_Leaf_Recognition_Algorithm_for_Plant_Classification_Using_Probabilistic_Neural_Network?el=1_x_8&enrichId=rgreq-376f230fe3f7f8d8836af193b25f43ee-XXX&enrichSource=Y292ZXJQYWdlOzI2MTEyMDM2ODtBUzoxMjE2NDkyNjUzODU0NzJAMTQwNjAxNDg1NDAwNw==

12 0 5

13 3 9

14 0 10

15 1 9

16 3 10

17 0 0

18 0 6

19 1 8

20 2 5

21 0 1

22 2 9

23 4 9

24 2 9

25 2 10

26 0 10

27 1 10

28 3 9

29 0 7

30 5 2

31 3 9

32 0 9

Figure 7: Example of bad identification results when using Hu as descriptor

TABLE III. SUMMARY OF RESULTS

Method Average accuracy

SVM+HOG without scale +MMC 76.25%

SVM+HOG+with scale 84.6875%

Hu+SVM with scale 23.125%

Hu+SVM without scale 25.3125%

The obtained results with 32 species are detailed Tab. II while

the summary of two experiments is presented in Tab. III. With

this database, the accuracy of HOG is much higher than that of

Hu. As you can see in Fig.7 and Tab.I, Hu descriptor is not

robust when working with normal shape leaf. We should

combine Hu descriptor with the features which is able to

describe vein of leaf.

IV. DISCUSSIONS

As we present in the previous sections, graphical tool and

automatic identification method developed in our system allow

users to identify a plant of interest. However, this framework

needs to be improved by the following direction.

The first direction is that, we need to improve leaf

identification accuracy of automatic method. Our experimental

results show that plant identification based on leaf information

has 80% accuracy with the Flavia dataset. However, images in

this dataset are taken in strictly controlled conditions (one

mature leaf in an image). In the real application, users can take

an image of leaf in different conditions. The plant identification

methods based on leaf information have to take into account

these conditions.

Taking into account that, a plant can be identified based on

several components (e.g. bark, flowers, and seeds). The second

direction concerns to multiple features/modalities plant

identification.

The third direction is to build plant identification system for

smartphone. According to the specialized press, the graphics

tablet and smartphone market is one of the fastest growing

markets nowadays. Apple published recently (January 2011)

that the number of dedicated applications for iPad is reaching

60.000, with a growing rate of 300 a day. Regarding the iPhone, it has surpassed the 340 000. The common user of a

graphic tablet is an urban one and uses it during two hours per

day, mostly for hobby. Main applications used by the general

public are interactive games, multimedia, e-reading, but it

seems that access to knowledge is an important factor in the

choice of buying a graphics tablet. Among the key factors of

success in the future of these new mobile devices, we can note

that the size of the screen and intuitive touch interface are

determining the choice. In addition to their big size and high

resolution screen, most Smartphone and new graphics tablets

are now equipped with integrated sensors such as a camera, accelerometers, compass and a GPS antenna. To the best of our

knowledge, there is very few plant identification applications

developed for smartphone. The application, called Leafsnap, is

one of plant identification applications for Iphone/Ipad. This

application requires users to provide a leaf image of plant of

interest. The system returns to users plants name, plant

photographs and information on the flowers, fruit, seeds and

bark. However, the plant identification accuracy of this

application is still limited. In order to identify a plant, users

have to take image of leaf on a white background.

V. CONCLUSIONS AND FUTURE WORKS

In this paper, we present a plant identification system

consisting of two main components: graphical tool and

automatic plant identification method. Concerning to graphical

tool, this graphical tool has been developed in various versions.

Feedbacks from end users for graphical tool show that this tool

is more effective than the classical tools because it uses

graphical icons instead of technical terms. It tolerates the

missing information. Experimental results of the automatic

plant identification method on Flavia dataset are promising.

However, this data set is relatively simple: one mature leaf per

image with uniform background. In real application, we have

to work with more difficult images (containing multiple leafs, with complex background, in different lighting conditions).

The automatic plant identification method has to take into

account these factors.

PERSPECTIVES

The research leading to this paper was supported by the

National Project B2011-01-05 "Study and develop an object

detection and recognition system in smart and perceptive

environments”. We would like to thank the project and people

involved in this project.

138

REFERENCE

1. BONNET, P., M. ARBONNIER, and P. GRARD. A

graphical tool for the identification of West African

savannas trees. in Smithsonian Botanical Symposium

2005 - "the future of floras: new frameworks, new

technologies, new uses". 2005.

2. Ta-Te Lin, Y.-T.C., Wen-Chi Liao. Leaf boundary

extraction and geometric modeling of vegetable

seedlings. in 2000 ASAE Annual International

Meeting. 2000. Milwaukee, Wisconsin, USA.

3. Qing-Ping Wang, J.-X.D., Chuan-Min Zhai,

Recognition of Leaf Image Based on Ring Projection

Wavelet Fractal Feature, in 6th International

Conference on Intelligent Computing, ICIC 2010.

2010, Springer: Changsha, China. p. 240-246.

4. Rahmadhani M., Y.H. Shape and Vein Extraction on

Plant Leaf Images Using Fourier and B-Spline

Modeling. in AFITA 2010 International Conference.

2010. Bogor Indonesia: IPB(Bogor Agricultural

University).

5. Yunyoung Nam, E.H.b., Dongyoon Kim, A similarity-

based leaf image retrieval scheme: Joining shape and

venation features. Computer Vision and Image

Understanding, 2008. 110(2): p. 245-259.

6. Hanife Kebapci, B.Y., Gozde Unal, Plant Image

Retrieval Using Color, Shape and Texture Features.

Oxford University Press on behalf of The British

Computer Society, 2010. 7. Xiao, X.-Y., et al., HOG-Based Approach for Leaf

Classification Advanced Intelligent Computing

Theories and Applications. With Aspects of Artificial

Intelligence. 2010, Springer Berlin / Heidelberg. p.

149-155.

8. Navneet Dalal , B.T. Histograms of Oriented

Gradients for Human Detection. in Computer Vision

and Pattern Recognition, 2005. 2005. San Diego, CA,

USA.

9. Hu, M.-K., Visual pattern recognition by moment

invariants IRE Transactions on Information Theory,

1962. 8(2): p. 179-187. 10. Stephen Gang Wu, F.S.B., Eric You Xu, Yu-Xuan

Wang, Yi-Fan Chang, Qiao-Liang Xiang, A Leaf

Recognition Algorithm for Plant Classification Using

Probabilistic Neural Network, in Signal Processing

and Information Technology, 2007 IEEE

International Symposium. 2007: Giza p. 11-16.

139





























Computer aided plant identification system

Documents