Evaluation for uncertain image classification and segmentation

arX

iv:0

806.

1796

v1 [

cs.C

V]

11

Jun

2008

Evaluation of Uncertain Image Classification and Segmentation

Algorithms

Arnaud Martina, Hicham Laanayaab and Andreas Arnold-Bosa

aENSIETA E3I2 EA3876,2 rue Francois Verny, 29806 Brest Cedex 09, France

bFaculte des sciences de Rabat,Avenue Ibn Batouta, B.P. 1014 Rabat, Morocco

Each year, numerous segmentation and classification algorithms are invented or reused to solve problems where

machine vision is needed. Generally, the efficiency of these algorithms is compared against the results given by

one or many human experts. However, in many situations, the location of the real boundaries of the object

as well as their classes are not known with certainty by the human experts. Moreover, only one aspect of the

segmentation and classification problem is generally evaluated. In our evaluation method, we take into account

both the classification and segmentation results as well as the level of certainty given by the experts. As a concrete

example of our method, we evaluate an automatic seabed characterization algorithm based on sonar images.

1. INTRODUCTION

Image classification and segmentation are twofundamental problems in image analysis. Seg-menting an image consists in dividing the imageinto homogeneous zones delimited by boundariesso as to separate the different entities visible inthe image. Classification consists in labeling thevarious components visible in an image. A greatdeal of segmentation and classification methodshave been proposed in the last thirty years [3];enumerating them all is not the purpose of ourpaper. However, an important question to solveis how to benchmark these methods and evaluatetheir robustness with respect to a given real-lifeapplication.

A typical example of the use of classificationand segmentation is encountered in satellite orsonar imaging, where an important use of thedata is to classify the types of soils present inthe images, for instance to build maps. As theamount of images gathered during a mission isimportant, automatic recognition algorithms canrelieve human operators. Since the swath of thesensor is wide, many types of soils can be en-countered within a single image, and the classi-

fication must be done on a local neighborhood.This neighborhood can be either limited to a sin-gle pixel, or often to a small tile of e.g. 16 × 16or 32 × 32 pixels taken as the unit for the classi-fication algorithm. The boundaries between thedifferent patches corresponding to a category ofsoil are a form of segmentation, which is here animplicit byproduct of the classification. In otherapplications, segmentation can come first so as toisolate entities which will be labeled later.

A difficulty raised in these applications is thelack of ground truth which could be used to eval-uate the result of the classification. The real ref-erence classes must be estimated by human ex-perts from the data themselves. However, the im-ages are difficult to read since they are corruptedby many phenomena and the estimation of theclasses by the human expert will be highly subjec-tive and with a varying level of uncertainty. In thecase of the automatic seabed classification, whichwe will use as our reference example throughoutthis paper, images are especially hard to interpretdue to many imperfections [2]. To reconstruct theimage, a huge number of parameters (geometry ofthe device, coordinates of the ship, movements ofthe sonar, etc.) are taken into account, but these

1

http://arXiv.org/abs/0806.1796v1

https://www.researchgate.net/publication/4221155_Comparative_study_of_information_fusion_methods_for_sonar_images_classification?el=1_x_8&enrichId=rgreq-705f075d-49ec-472a-8f09-09a4dd896e0f&enrichSource=Y292ZXJQYWdlOzIyMjgyMzE5ODtBUzoxMTM4NDI0MTU3Mzg4ODBAMTQwNDE1MzU1NTM3Nw==

https://www.researchgate.net/publication/265316124_In_The_Image_Processing_Handbook?el=1_x_8&enrichId=rgreq-705f075d-49ec-472a-8f09-09a4dd896e0f&enrichSource=Y292ZXJQYWdlOzIyMjgyMzE5ODtBUzoxMTM4NDI0MTU3Mzg4ODBAMTQwNDE1MzU1NTM3Nw==

2 A. Martin et al.

data are polluted with a large amount of sensornoise. Plus, other phenomena such as multipathsignal propagation (caused by reflection either onthe bottom or the surface), speckle, and the pres-ence of fauna and flora (e.g. shadows of fishes onthe sea bottom), will all augment the difficulty ofinterpretation of the image. Consequently, differ-ent experts can propose different classifications ofthe image. Thus, in order to evaluate automaticclassification, we must take into account this dif-ference and the uncertainty of each expert. Fig-ure 1 exhibits the differences between the inter-pretation and the certainty of two sonar expertstrying to differentiate the type of sediment (rock,cobbles, sand, ripple, silt) or shadow when the in-formation is invisible (each color correspond to akind of sediment and the associated certainty ofthe expert for this sediment expressed in term ofsure, moderately sure and not sure).

Figure 1. Segmentation given by two experts.

We propose in this article a new approach for

image classification and segmentation taking intoaccount the information giving by multiple ex-perts and the certainty of the given information.Classical evaluations of the classification and seg-mentation do not take into account the uncer-tain and imprecise labels in the reference imageprovided by an expert. We think that we haveto consider these kind of labels in our evalua-tion approach. In section 2 we show how to in-tegrate the expert certainty in confusion matrixand so to deduce a good classification rate anderror classification rate. Moreover, our thesis isthat global image classification evaluation mustbe made not only by evaluating the classificationon considered units (with the confusion matrix)but also by evaluating, at the same time, the in-duced segmentation. In section 3, we propose twonew distance-based measures in order to evaluatewell and mis-segmented pixels by taking into ac-count both the location of the borders and theexpert certainty. Note that another importantcriterion to evaluate classification/segmentationapproaches is the evaluation of the complexity ofthe algorithms [1], but we do not consider it inthis paper. Finally, our evaluation is illustratedin section 4 on real sonar images acquired in areal, uncertain environment.

2. CLASSIFICATION EVALUATION

Traditional classification systems can usuallybe described as a three-tiered process. First, sig-nificant features are extracted from the imagesto classify. These features are widely different,depending on the application; they are generallydescribed using a small set of abstract numeri-cal measures. For example, used features may bethe local luminance, the texture (described withmeasures such as the entropy, the co-ocurrencematrices, etc), the contours (described with theirlength, their orientation, their relative position toother contours, etc) [3]. Most of the time, a sec-ond stage is necessary to reduce these features,because there are too numerous. In the thirdstage of the algorithm, the numerical descrip-tors are fed to classification algorithms, which areapplication-independent, such as Support Vec-tor Machine [4,5,6], neural networks [2,6,7,8], k-


https://www.researchgate.net/publication/222653856_Support_vector_machine-based_image_classification_for_genetic_syndrome_diagnosis?el=1_x_8&enrichId=rgreq-705f075d-49ec-472a-8f09-09a4dd896e0f&enrichSource=Y292ZXJQYWdlOzIyMjgyMzE5ODtBUzoxMTM4NDI0MTU3Mzg4ODBAMTQwNDE1MzU1NTM3Nw==

https://www.researchgate.net/publication/222182062_On_the_classification_of_multispectral_satellite_images_using_the_multilayer_perceptron?el=1_x_8&enrichId=rgreq-705f075d-49ec-472a-8f09-09a4dd896e0f&enrichSource=Y292ZXJQYWdlOzIyMjgyMzE5ODtBUzoxMTM4NDI0MTU3Mzg4ODBAMTQwNDE1MzU1NTM3Nw==

https://www.researchgate.net/publication/4176345_A_New_Dimensionality_Reduction_Method_for_Seabed_Characterization_Supervised_Curvilinear_Component_Analysis?el=1_x_8&enrichId=rgreq-705f075d-49ec-472a-8f09-09a4dd896e0f&enrichSource=Y292ZXJQYWdlOzIyMjgyMzE5ODtBUzoxMTM4NDI0MTU3Mzg4ODBAMTQwNDE1MzU1NTM3Nw==

https://www.researchgate.net/publication/7939300_Significance_analysis_of_qualitative_mammographic_features_using_linear_classifiers_neural_networks_and_support_vector_machines?el=1_x_8&enrichId=rgreq-705f075d-49ec-472a-8f09-09a4dd896e0f&enrichSource=Y292ZXJQYWdlOzIyMjgyMzE5ODtBUzoxMTM4NDI0MTU3Mzg4ODBAMTQwNDE1MzU1NTM3Nw==


https://www.researchgate.net/publication/222698637_Content-based_image_classification_using_a_neural_network?el=1_x_8&enrichId=rgreq-705f075d-49ec-472a-8f09-09a4dd896e0f&enrichSource=Y292ZXJQYWdlOzIyMjgyMzE5ODtBUzoxMTM4NDI0MTU3Mzg4ODBAMTQwNDE1MzU1NTM3Nw==


Evaluation of Uncertain Image Classification and Segmentation Algorithms 3

nearest neighbors [9], etc. The classification al-gorithms will decide, depending on their entries,which is the class of the image.

Hence, we have to evaluate these classificationalgorithms in order to compare their robustnessin a given application. The classical approach isbased on the confusion matrix and does not takeinto account uncertain labels. We propose herea new confusion matrix and good classificationand error rates taking into account these kind oflabels and also the inhomogeneous units definedforwards.

The proposed method of evaluation in this sec-tion, can be applied for the evaluation of a classi-fication algorithm in every domain where uncer-tain labels are provided. We do not consider herethe problem of the learning on uncertain and im-precise labels [10,11,12]: the classification can bemade by this kind of algorithms or others.

2.1. Classical Evaluation

The results of one image classification can beobserved and visually compared to the reality.But in order to evaluate a classification algorithm,many different configurations and tests must beconsidered. Classification algorithms can yieldvery variable results depending on the sample.Generally classification algorithms evaluation isconducted by the confusion matrix. Confusionmatrix is composed by the number cmij of ele-ments from the class i classified in the class j. Inorder to obtained rates, with one more easier tointerpret, we can normalize this confusion matrixby:

Ncmij =cmij

∑N

j=1 cmij

=cmij

Ni

, (1)

with N the number of considered classes and Ni

the number of element from the true class i. Fromthis normalized confusion matrix a good classifi-cation rate vector can be written as:

GCRi = Ncmii, (2)

and an error classification rate vector as:

ECRi =1

2

N∑

j=1,j 6=i

Ncmij +

N∑

i=1,i6=j

Ncmij

N − 1

.(3)

This error classification rate is the mean of thetwo errors corresponding respectively to the el-ements from a given class i falsely classified aselements of another class (first term), and to theelements classified in a given class j but beingfrom another class i (second term). These errorsare also called errors of first and second kind. Wedo not have to normalize the first term because ofthe normalization of the confusion matrix on therows, but the second term must be normalizedby the number of rows minus one (because of theNcmii term corresponding to the good classifica-tion). Note that other error rates can be defined(see e.g. [10]).

We have seen that image classification algo-rithms evaluation must be made not only on oneimage but on the whole image database. As atrivial consequence, we have to consider a non-normalized confusion matrix on each image andnormalize the sum of the matrix confusion on allimages of the database.

2.2. Evaluation with expert information

Consider a general case where information isgiven by the expert on each pixel and the clas-sification algorithm is made on an unit of n × npixels. In such a case, if a n × n tile is consid-ered, more than one class can be present (we callit patch-worked tile or inhomogeneous unit), andthe classification algorithm can find only one ofthese class. In order to take into account the lastexample, we consider that if the classification al-gorithm finds one of these classes on the tile, thealgorithm is right in the proportion of this classfound in the n × n tile and it is false in the pro-portion of the other classes in the tile. For in-stance, imagine the case where the expert con-siders a 16 × 16 tile and declares that 156 givenpixels belong to class 1, and 100 other pixels be-long to class 3. If the classification algorithm findsthe tile belongs to class 1, the confusion matrixwill be computed by cm11 = cm11 + 156/256 andcm31 = cm31+100/256. Hence the confusion ma-trix is not composed of integer numbers and Ni isalso not integer, but the sums of column are stillinteger.

Now consider the case where the expert givesthe class with a certainty grade. For instance,

https://www.researchgate.net/publication/3412026_A_neural_network_classifier_based_on_Dempster-Shafer_theory?el=1_x_8&enrichId=rgreq-705f075d-49ec-472a-8f09-09a4dd896e0f&enrichSource=Y292ZXJQYWdlOzIyMjgyMzE5ODtBUzoxMTM4NDI0MTU3Mzg4ODBAMTQwNDE1MzU1NTM3Nw==

https://www.researchgate.net/publication/4176313_Use_of_Classification_and_Segmentation_of_Sidescan_Sonar_Images_for_Long_Term_Registration?el=1_x_8&enrichId=rgreq-705f075d-49ec-472a-8f09-09a4dd896e0f&enrichSource=Y292ZXJQYWdlOzIyMjgyMzE5ODtBUzoxMTM4NDI0MTU3Mzg4ODBAMTQwNDE1MzU1NTM3Nw==

https://www.researchgate.net/publication/12921560_Fuzzy_pre-processing_of_gold_standards_as_applied_to_biomedical_spectra_classification?el=1_x_8&enrichId=rgreq-705f075d-49ec-472a-8f09-09a4dd896e0f&enrichSource=Y292ZXJQYWdlOzIyMjgyMzE5ODtBUzoxMTM4NDI0MTU3Mzg4ODBAMTQwNDE1MzU1NTM3Nw==


https://www.researchgate.net/publication/220907411_Partially_Supervised_Learning_by_a_Credal_EM_Approach?el=1_x_8&enrichId=rgreq-705f075d-49ec-472a-8f09-09a4dd896e0f&enrichSource=Y292ZXJQYWdlOzIyMjgyMzE5ODtBUzoxMTM4NDI0MTU3Mzg4ODBAMTQwNDE1MzU1NTM3Nw==

4 A. Martin et al.

the operator can be moderately sure of his choicewhen he labels one part of the image as belong-ing to a certain class, and be totally doubtful onanother part of the image. In our classificationevaluation we must not take these two referencesequally. Indeed, classical confusion matrices im-ply that the reality is perfectly known; this, un-fortunately, is not the case in many real appli-cations. We propose to represent this differenceof information by different weights correspondingto the different certainty grades that are consid-ered. For example, if three grades of certainty(sure, moderately sure and not sure) are con-sidered, we can provide respectively the weights:2/3, 1/2 and 1/3. In the confusion matrix, suchweights could be integrated easily in the generalsum. If one expert labels a tile as belonging toclass 1, with a moderate certainty, and if the clas-sification algorithm finds the class 1, consideringthe previous given weights, the confusion matrixwill be updated such as: cm11 = cm11 + 1/2. Ifthe classification algorithm finds the class 2 onthe considered tile, the confusion matrix becomescm12 = cm12 + 1/2. Hence the sums of columnare not integer anymore.

In order to take into account the referenced im-ages provided by different experts, we can com-pare the classified image with all the expert-referenced images. Hence we obtain as many con-fusion matrices as experts, and we can simplycombine them by addition.

By the simple fact that we add the non-normalized confusion matrices, we weight the ob-tained results by the image size or the consideredunit number.

Consequently, in order to obtain rates, we cannormalize the obtained confusion matrix withequation (1) and calculate the good classificationrate vector with equation (2) and the error classi-fication rate vector with equation (3). Of coursethese rates are not percentages anymore. For in-stance, the good classification rate is not percent-age of well classified units anymore, because theweights given by the inhomogeneous units or byexpert certainty are rational.

In conclusion of this section: the interest ofthese newly obtained confusion matrix, good clas-sification rate and error classification rate is that,

they give a good evaluation of classification tak-ing into account the inhomogeneous units and un-certainty of the experts. This approach can beapplied in other applications than image classifi-cation, in fact in every domain where we try toclassify uncertainty elements.

3. SEGMENTATION EVALUATION

Segmentation can either be obtained as abyproduct of the classification, as shown above,or be used as the first step of an image process-ing pipeline. Many methods of image segmen-tation and edge detection have been proposed[14,15,13,16,17]. It is important to be able tobenchmark these methods and to evaluate theirrobustness; but to do that, measures are neededso as to have an objective means to judge thequality of the segmentation. No perfect measureexists today, and existing measures are not wellsatisfied, this is why we can imagine fusing thesegmentation evaluation approaches [18].

On the one hand the image classification meth-ods are evaluated by the confusion matrix. Goodclassification rates and error rates are usually cal-culated from this matrix. Note that in order to es-tablish the confusion matrix, the real class of theconsidered units of the images need to be known.This gives only an evaluation of the classificationapproach on considered units of the image, butdoes not give an evaluation of the produced seg-mentation.

On the other hand, segmentation evaluationcannot be made only by visual comparison be-tween the initial image and the segmented image.Many evaluation approaches have been proposedfor image segmentation [1,16,19,20,21]. We canconsider two cases: we do not have any a pri-

ori knowledge of the correct segmentation, andwe have an a priori knowledge of the correct seg-mentation. In the first case, many effectivenessmeasures based on intra-region uniformity, inter-region contrast and region shape have been pro-posed [1]. The second case implies to get refer-enced images. In a real application, experts mustmanually provide the image segmentation via avisual inspection. [1] gives a review of usual dis-crepancy measures based on different distances

https://www.researchgate.net/publication/224377985_A_Computational_Approach_To_Edge_Detection?el=1_x_8&enrichId=rgreq-705f075d-49ec-472a-8f09-09a4dd896e0f&enrichSource=Y292ZXJQYWdlOzIyMjgyMzE5ODtBUzoxMTM4NDI0MTU3Mzg4ODBAMTQwNDE1MzU1NTM3Nw==

https://www.researchgate.net/publication/222492170_Evaluation_and_comparison_of_different_segmentation_algorithms?el=1_x_8&enrichId=rgreq-705f075d-49ec-472a-8f09-09a4dd896e0f&enrichSource=Y292ZXJQYWdlOzIyMjgyMzE5ODtBUzoxMTM4NDI0MTU3Mzg4ODBAMTQwNDE1MzU1NTM3Nw==

https://www.researchgate.net/publication/3193578_A_method_for_objective_edge_detection_evaluation_and_detector_parameter_selection?el=1_x_8&enrichId=rgreq-705f075d-49ec-472a-8f09-09a4dd896e0f&enrichSource=Y292ZXJQYWdlOzIyMjgyMzE5ODtBUzoxMTM4NDI0MTU3Mzg4ODBAMTQwNDE1MzU1NTM3Nw==

https://www.researchgate.net/publication/222560789_Unsupervised_Image_Segmentation_Combining_Region_and_Boundary_Estimation?el=1_x_8&enrichId=rgreq-705f075d-49ec-472a-8f09-09a4dd896e0f&enrichSource=Y292ZXJQYWdlOzIyMjgyMzE5ODtBUzoxMTM4NDI0MTU3Mzg4ODBAMTQwNDE1MzU1NTM3Nw==

https://www.researchgate.net/publication/228377043_A_co-evaluation_framework_for_improving_segmentation_evaluation?el=1_x_8&enrichId=rgreq-705f075d-49ec-472a-8f09-09a4dd896e0f&enrichSource=Y292ZXJQYWdlOzIyMjgyMzE5ODtBUzoxMTM4NDI0MTU3Mzg4ODBAMTQwNDE1MzU1NTM3Nw==

https://www.researchgate.net/publication/223846068_Unsupervised_Texture_Image_Segmentation_using_2-D_Quarter_Plane_Autoregressive_Model_with_Four_Prediction_Supports?el=1_x_8&enrichId=rgreq-705f075d-49ec-472a-8f09-09a4dd896e0f&enrichSource=Y292ZXJQYWdlOzIyMjgyMzE5ODtBUzoxMTM4NDI0MTU3Mzg4ODBAMTQwNDE1MzU1NTM3Nw==

https://www.researchgate.net/publication/222411217_Color_image_segmentation_based_on_three_levels_of_texture_statistical_evaluation?el=1_x_8&enrichId=rgreq-705f075d-49ec-472a-8f09-09a4dd896e0f&enrichSource=Y292ZXJQYWdlOzIyMjgyMzE5ODtBUzoxMTM4NDI0MTU3Mzg4ODBAMTQwNDE1MzU1NTM3Nw==



(sometimes expressed in terms of probability) be-tween the segmented-pixel and the referenced-pixel.

Most of the time, only a measure of how manypixels are mis-segmented is given. We, on thecontrary, propose in this article a combined studyof one well-segmented pixel measure and a mis-segmented pixel measure. Indeed, most of thetime, when a pixel is not mis-segmented, it isnot necessary well-segmented either. As a con-sequence, we can have few mis-segmented pixelsbut also few well-segmented pixels, which meansthat the segmentation is not good overall.

In order to calculate confusion matrices weneed the a priori knowledge of the class for eachpixel or at least for each considered unit of theimage. Hence, experts have to give referencedimages, and we can consider to be in the secondcase of segmentation evaluation that we describedabove.

Before presenting our method of segmentationevaluation, we show how we can easily obtain adeducted segmentation from an image classifica-tion based on the classification on tiles. Next,the proposed segmentation evaluation method isadapted to every image segmentation and cantake into account imprecise labels.

3.1. Deducted segmentation

Image classification provides an implicit imagesegmentation given by the difference of classes be-tween two adjacent tiles. Hence a good imageclassification evaluation should take this segmen-tation into account as well.

First of all, we have to define the boundarypixels given by the image classification. We pro-pose here to use a very simple approach: we willtake as boundary pixels, the pixels which neigh-bor another class on the right or/and on the bot-tom. For instance, on table 1 we give a dummysegmented image with two classes given by × and•. The classification unit is here 4 × 4. Theboundary pixels are underlined.

Many approaches can be considered in order toobtain boundaries without angular points. Wecan consider for instance an interpolation be-tween the 4-connexity or 8-connexity points [22].This is not the subject of this paper; the reader

Table 1Example of an obtained segmentation on imagewith two classes given by × and •.

× × × × • • • •× × × × • • • •× × × × • • • •× × × × • • • •• • • • × × × ×• • • • × × × ×• • • • × × × ×• • • • × × × ×

should keep in their mind that our segmentationevaluation is general and can be applied to allimage segmentations given by boundary pixels.

3.2. Segmentation evaluation

We recall here that in our case, we have an a

priori knowledge of the correct or approximatelycorrect segmentation given by the experts. Inthis case all evaluation approaches are based ondifferent distances (or probabilities) between thesegmented-pixel and the referenced-pixel [1,23,24]and most of the time only one measure of mis-segmented pixel is given. We think that it is notenough for a precise segmentation evaluation ifa pixel can be not mis-segmented, and also notwell-segmented. As we mentioned before, we canhave few mis-segmented pixels only with few well-segmented pixels, and so the segmentation can-not be considered right. So we propose a linkedstudy of two new measures: one well-segmentedpixel measure and one mis-segmented pixel mea-sure. Moreover these two measures can take intoaccount the uncertainty of the expert on the po-sition and on the existence of the boundaries ifthis uncertainty can be expressed as a weight.

3.2.1. Boundary good detection measure

The well segmented pixel measure is a mea-sure of how the boundary is well detected andthe mis-segmented pixel measure tries to quan-tify how many boundaries detected by the al-gorithm to benchmark have no physical reality.First, we search the minimal distance dfe betweeneach boundary pixel f found by the algorithm to

https://www.researchgate.net/publication/222046799_A_study_of_edge_detection_algorithms?el=1_x_8&enrichId=rgreq-705f075d-49ec-472a-8f09-09a4dd896e0f&enrichSource=Y292ZXJQYWdlOzIyMjgyMzE5ODtBUzoxMTM4NDI0MTU3Mzg4ODBAMTQwNDE1MzU1NTM3Nw==

https://www.researchgate.net/publication/220500978_Methodology_for_quantitative_performance_evaluation_of_detection_algorithms?el=1_x_8&enrichId=rgreq-705f075d-49ec-472a-8f09-09a4dd896e0f&enrichSource=Y292ZXJQYWdlOzIyMjgyMzE5ODtBUzoxMTM4NDI0MTU3Mzg4ODBAMTQwNDE1MzU1NTM3Nw==

6 A. Martin et al.

benchmark, and all the boundary pixels e pro-vided by the expert. Hence the pixel e is a func-tion of f , and we should note it as ef , but inorder to simplify notations, it is referred as e inthe rest of paper. We take here an Euclidean dis-tance but any other distance can be envisaged.The certainty weight of the pixel e given by theexpert is noted as We. We define a well-detectioncriteria vector by:

DCf = exp(−(dfe ∗ We)2) ∗ We. (4)

This criteria gives a Gaussian-like distribution ofweights with a standard deviation given by thecertainty weights as shown in figure 2.

Figure 2. Distance weight for the well-detectioncriteria.

The boundary good detection measure is de-fined by the normalized well-detection criteriagiven by:

WDC =

∑

f DCf

(maxf (DCf ) ∗∑

e We)a . (5)

The normalization is made in order to obtain ameasure defined between 0 and 1. However, inreal applications, this criteria remains small evenfor very good boundary detection. So we takea = 1/6 in order to accentuate small values.

This criteria is not completely satisfying be-cause it only takes into account the distance fromthe found boundary to the contour provided bythe expert. However, the reference boundary alsohas a local direction which is another informa-tion we want to use. A boundary found by thealgorithm can come across a boundary given bythe expert orthogonally: in this case some pixels

from the found boundary are very near (in termsof distance) to pixels from the given boundarybut we do not want say that is a good detection.We propose two ways to consider the direction ofboundaries.

In the first one, we count, for a given pixel fof the found boundary, how many pixels from thefound boundary are linked by the minimal dis-tance to the same pixel e of the referenced bound-ary. This number is noted nef , e.g. on figure 3we have nef = 3 for three different f . We redefinethe well-detection boundary measure by:

WDC =

∑

f DCf/nef

(maxf (DCf/nef ) ∗∑

e We)a . (6)

Figure 3. Example of nef for three given f , thefound boundary is represented by green squareand the referenced boundary by black line.

The problem is that the number nef does notnecessarily represent a number of pixels on thesame boundary and takes well into account onlythe orthogonal direction. However this measuregives the best evaluation of the proportion of thefound boundaries.

The second method is based on the idea thatthe local direction of the boundary should also betaken into account: the direction of the detectedboundary and the direction of the boundary givenby the expert should be the same. Now, how doesone compute the direction of the boundary? LetIr denote the reference boundary image given bythe expert. Ir(i, j) = 0 if no boundary is detectedat pixel (i, j); Ir(i, j) = We otherwise, where We

is the weight of the pixel boundary e at location(i, j) given by the expert. Image Ir can be seenas a discrete 2-D function on which the gradient−→g r = [∂Ir/∂x; ∂Ir/∂y] can be computed. The


gradient has the property to be normal to iso-values lines of Ir and will therefore be normal tothe boundaries given by the expert. Similarly, onecan also compute the gradient −→g s of the foundboundary image. Then, a measure of correspon-dence between the directions at pixel (i, j) can begiven by the absolute value of the normalized dotproduct between the two gradients vectors1:

BD =|−→g r.

−→g s|

||−→g r||.||−→g s||

. (7)

However, as Ir is mostly filled with zeros, thegradient will have a negligible value at most lo-cations. The farther a pixel is from a boundarygiven by the operator, the lower the gradient atthat pixel will be, thus yielding a huge impreci-sion on the local direction of the image. To solvethis problem, we used the Gradient Vector Flow(GVF), first introduced by Xu and Prince [25].For a boundary image I, the GVF is a vector field−→f = [u(x, y); v(x, y)] that is computed iterativelyso as to minimize the following cost function overall the boundary image:

U =

∫∫

(

µ.(u2x + u2

y + v2x + v2

y) + . . .

+||−→g ||2||−→g −−→f ||2

)

.dx.dy. (8)

where µ is a tunable weight, variables in indicesdenote partial derivation with respect to thatvariable, and −→g is the gradient of the image asdefined previously. This cost function was de-vised so that on boundaries, where the gradientis high (||−→g || → ∞) the energy remains bounded:

||−→g −−→f || must tend to zero if one wishes the inte-

grand to be minimized. Thus, on boundaries, the

GVF is equal to the gradient field. On the otherhand, for pixels far away from an y boundary, thegradient will tend toward zero, and the integrandwill be driven by the term µ.(u2

x+u2y+v2

x+v2y). To

minimize it, the partial derivatives of the vectorfield

−→f must be null, which means that the GVF

extends the gradient by continuity to zones where

it would normally be negligible. The GVF is com-puted both for the reference image and the image

1The notation “.” for multiplication is a term by term

multiplication of the two matrices.

obtained through segmentation. The measure ofcorrespondence between the boundary directionswill be similar to equation (7):

BD =|−→f r.

−→f s|

||−→g r||.||−→g s||

. (9)

On figure 4, note that the gradient is onlystrong on edges, whereas the GVF is strong ev-erywhere, thus enabling the local directions to beseen.

Figure 4. Computing the direction of the bound-aries: gradient (top), GVF (bottom).

Hence, we can redefine DCf in equation (6)by (DC.BD)f , so that we obtain a new measurewhich takes into account the local direction of thefound boundaries.

8 A. Martin et al.

3.2.2. False detection boundary measure

The boundary false detection measure is basedon the same principle than the well-detectedboundary measure, but the Gaussian-like distri-bution of weights must be inversed. Hence we candefined a false detection criteria by:

FDCf = 1 − DCf/We, (10)

where the pixels f and e are linked by the min-imal distance dfe. As a consequence, the falsedetection boundary measure can be defined bythe normalized false detection criteria by:

FD = 1− exp

(

−

∑

f(FDCf∗nef )

maxf (FDCf∗nef )∗∑

eWe

)

.(11)

In order to take into account the local direc-tion of the found boundaries as found with theGVF, we can redefine DCf in equation (6) by(FDC.(1−BD))f , so we obtain another new falsedetection criteria.

Here we have described the use of measures FDand WDC for one image classified by the algo-rithm and another image provided by only oneexpert. In order to evaluate image segmentationalgorithms on many images we can use a weightedsum of these both measures, taking into accountthe image sizes, which can be different for all con-sidered images.

In conclusion of this section, we have describedtwo new measures FD and WDC taking into ac-count the uncertainty of different experts on theseen boundaries. We have to consider these twomeasures together.

4. ILLUSTRATION

We present here an illustration of our imageclassification and segmentation evaluation on realsonar images. Indeed, underwater environmentis a very uncertain environment and it is par-ticularly important to classify seabed for numer-ous applications such as Autonomous UnderwaterVehicle navigation. In recent sonar works (e.g.[26,27]), the classification evaluation is made onlyby visual comparison of one original image andthe classified image. That is not satisfying inorder to correctly evaluate image classificationand segmentation. First we present our database

given by two different experts with different cer-tainties. Then, one possible classification ap-proach for sonar image is presented. Finallythe automatic classification and segmentation ob-tained by this approach is evaluated with our newevaluation method.

Note that this illustration is presented in orderto show how our measures work on only one clas-sifier. In order to evaluate a classifier, we have tocompare the results with another classifier or withother parametrization of the evaluated classifier.

4.1. Database

Our database contains 42 sonar images pro-vided by the GESMA (Groupe d’Etudes Sous-Marines de l’Atlantique). Theses images were ob-tained with a Klein 5400 lateral sonar with a res-olution of 20 to 30 cm in azimuth and 3 cm inrange. The sea-bottom depth was between 15 mand 40 m.

The experts have manually segmented theseimages giving the kind of feature visible in a givenpart of the image: sediment (rock, cobble, sand,silt, ripple -either horizontal, vertical or at 45 de-grees), shadow or other features (typically ship-wrecks). All sediments are given with three cer-tainty levels (sure, moderately sure or not sure),and the boundary between two sediments is alsogiven with a certainty (sure, moderately sure ornot sure). Hence, every pixel of every image islabeled as being either a certain type of sedimentor a shadow, or a boundary with one of the threecertainty levels. Figure 1 gives an example of sucha segmentation provided by the expert.

4.2. Classification approach

The classification approach is based on super-vised classification. In order to teach the classifierwe have randomly divided the database into twoparts. On the learning database we have consid-ered, on randomly chosen images only, the homo-geneous tiles with a 32 × 32 size and with a sureor moderately sure certitude level until to get ap-proximately the same number of tiles in the learn-ing and test databases. On the test database wehave considered tiles with a 32 × 32 size and a re-covering step of 4. On each tile we have extractedsome features by a wavelet decomposition.


The discrete translation invariant wavelettransform is based on the choice of the optimaltranslation for each decomposition level. Eachdecomposition level d gives four new images. Wechoose here a decomposition level d = 2. For eachimage Ii

d (the ith image of the decomposition d)we calculate three features. The energy is givenby:

1

NM

N∑

n=1

M∑

m=1

Iid(n, m), (12)

where N and M are respectively the number ofpixels on the rows, and on the columns. The en-tropy is estimated by:

−1

NM

N∑

n=1

M∑

m=1

Iid(n, m) ln(Ii

d(n, m)), (13)

and the mean is given by:

1

NM

N∑

n=1

M∑

m=1

|Iid(n, m)|. (14)

Consequently we obtain 15 features (3+4*3).The chosen classifier is based on a Support

Vector Machine. The algorithm used here is de-scribed in [28]. It is a one-vs-one multi-class ap-proach, and we take a linear kernel with a con-stant C = 1.

We have considered only three classes for learn-ing and tests:

- class 1: Rock and Cobble

- class 2: Ripple in all directions

- class 3: Sand and Silt

Hence shadow is not considered and so the classi-fication can not be good on tiles with shadow. Inorder to take into account unknown classes, onesolution is to add a rejected class in the classifier.However, as we show farther down, we can alsotake into account this class if the classifier has norejected class.

The units of the classifier are tiles with a 32 ×32 size with a recovering step of 4. Hence, we canclassify tiles with a 4 × 4 size, considering the tileof 4 × 4 size in the middle on each tile of 32 ×32.

4.3. Evaluation

Figure 5 shows the result of the classification ofthe same image than the one given in the figure1. Sand (in red) and rock (in blue) are quitewell classified but ripple (in yellow) is not wellsegmented. The dark blue corresponds to thatpart of the image that was not considered for theclassification.

Figure 5. Automatic segmented image.

Just by looking this figure 5 we cannot saywhether the classification is good or not, andany decision stays very subjective. Moreover, theclassification algorithm could be good for this im-age and not for others. So we propose to use ourmeasures. The used weights here for the certitudeare respectively 2/3 for sure, 1/2 for moderatelysure and 1/3 for not sure. But other weights canbe preferred according to the application.

The normalized confusion matrix obtained forone randomly partition of the database is givenby:

40.51 5.77 53.7219.65 18.79 61.563.51 1.15 95.3445.96 12.47 41.57

(15)

The last line means that there is shadow or otherparts classified in class 1, 2 or 3. We can note thata high proportion of the rock or cobble (class 1) isclassified as sand or silt (class 3), and most of theripple (class 2) also. Sand and silt, the most com-mon kinds of sediments on our images, are very

10 A. Martin et al.

well classified. The vector of good classificationrate given by [40.51 18.79 95.34 0] and the vectorof error classification rate given by [41.26 43.8428.47 50.00] summarize these results. Whereaswe have good classification for sand and silt, wealso a lot of errors because other sediments areclassified as sand or silt.

These results are not significant enough in or-der to well evaluate the obtained segmentation.Our proposed measures, given respectively by theequations (6) and (11) expressed in percentage,are 65.17 for the good detection criteria and 61.35for the false alarm criteria, if we consider the di-rection based on the GVF the proposed measuresgive 63.11 for good detection criteria and 64.84for the false alarm criteria.

To better illustrate these two last measures, wehave proceeded to four more randomly partitions.We obtain a mean of 63.53 for the good detec-tion criteria with 3.37 for the standard deviationand a mean of 60.53 for the false alarm criteriawith 7.72 for the standard deviation. If we con-sider the direction based on the GVF, we obtaina mean of 60.09 for the good detection criteriawith 3.13 for the standard deviation and a meanof 52.62 for the false alarm criteria with 8.04 forthe standard deviation. The standard deviationsshow that the good detection criteria is more sta-ble than the false alarm criteria. Our two mea-sures can well evaluate the good detection andthe false alarm. When we consider the directionbased on the GVF, the criteria decrease becauseof the weights given by the directions. Here, thededucted segmentation is dependent of the sizeof the tile, in this case it could be better to notconsider the direction based on the GVF.

In order to evaluate the classifier approach, allthese measures have to be compared to the samemeasures calculated for other parameterizationsor for other classifier algorithms.

5. CONCLUSION

We have proposed some new evaluation mea-sures for image classification and segmentation inuncertain environments. These new evaluationmeasures can take into account the uncertain la-bels. The proposed classification evaluation can

be used for every kind of uncertain elements clas-sification and our segmentation evaluation can beused for all image segmentation approaches. Wehave shown that a global image classification eval-uation must be made by the evaluation of theclassification and, at the same time, by the eval-uation of the produced segmentation. The pro-posed confusion matrix take into account the un-certainty of the expert and also the inhomoge-neous units (e.g. tiles in the case of local imageclassification). Moreover we have defined goodclassification and error classification rates fromour confusion matrix. The proposed segmenta-tion evaluation considers good and false detectionboundary measures where the subjectivity of theexpert is considered by the given uncertainty onthe boundaries.

The fusion of the information provided by var-ious experts in our proposed evaluation approachis made after an individual evaluation, whichmeans that we fuse our different measures cal-culated for each expert. This fusion is made byusing a simple sum: the uncertainty is consid-ered directly in our measures. We can imag-ine fusing the information provided by expertsbefore the evaluation in order to obtain uncer-tain and/or imprecise reality (e.g. defining fuzzyzones around the boundaries according to the cer-tainty given by the experts). The fusion can bemade also by belief functions defined from the un-certainties. In this case we have to redefine ourproposed measures. For instance, the reality ob-tained by the fusion of experts could be used tooutperform the learning step of the classification.

REFERENCES

1. Y.J. Zhang, A survey on evaluation meth-ods for images segmentation, Pattern Recog-nition, Vol. 29, No. 8 (1996), 1335-1346.

2. A. Martin, Comparative study of informationfusion methods for sonar images classifica-tion, The Eighth International Conference onInformation Fusion, Philadelphia, USA, 25-29July 2005.

3. J.C. Russ, The Image Processing Handbook,CRC Press, 2002.

4. H. Laanaya, A. Martin, D. Aboutajdine, and









A. Khenchaf, A new dimensionality reductionmethod for seabed characterization: super-vised curvilinear component analysis, IEEEOCEANS’05 EUROPE, Brest, France, 20-23June 2005.

5. A. David and B. Lerner, Support vectormachine-based image classification for geneticsyndrome diagnosis, Pattern Recognition Let-ters, Vol. 26, Issue 8 (2005), 1029-1038.

6. M. Mavroforakis, H. Georgiou, N. Dim-itropoulos, D. Cavouras and S. Theodor-idis, Significance analysis of qualitative mam-mographic features, using linear classifiers,neural networks and support vector ma-chines, European Journal of Radiology, Vol.54 (2005), 80-89.

7. S.B. Park, J.W. Lee and S.K. Kim, content-Based image classification using a neural net-work, Pattern Recognition Letters, Vol. 25,Issue 3 (2004), 287-300.

8. Y.V. Venkatesh and S. Kumar Raja, Onthe classification of multispectral satellite im-ages using the multilayer perceptron, PatternRecognition, Vol. 36, No. 8 (2003), 2161-2175.

9. I. Leblond, M. Legris and B. Solaiman, Useof classification and segmentation of sidescansonar images for long term registration, IEEEOCEANS’05 EUROPE, Brest, France, 20-23June 2005.

10. N.J. Pizzi, Fuzzy pre-processing of gold stan-drads as applied to biomedical classification,Artificial Intelligence in Medecine, Vol. 16(1999), 171-182.

11. T. Denœux, A Neural Network ClassifierBased on Dempster-Shafer Theory, IEEETransactions on Systems, Man, and Cyber-netics - Part A: Systems and Humans, Vol.30, Issue 2 (2000), 131-150.

12. P. Vannoorenberghe, and Ph. Smets, Par-tially supervised learning by a credal EM ap-proach, ECQSARU 2005, Barcelona, Spain,july 2005.

13. O. Alata and C. Ramananjarasoa, Unsuper-vised textured images segmentation using 2-Dquarter plane autoregressive model with fourprediction supports, Pattern Recognition Let-ters, Vol. 26, Issue 11 (2005), 1069-1081.

14. J. F. Canny, ”A computational approach to

edge detection,” IEEE Transactions on Pat-tern Analysis and Machine Intelligence, vol.8, 1986.

15. P. Kovesi, ”Image Features From Phase Con-gruency”. Videre: A Journal of Computer Vi-sion Research. MIT Press. Volume 1, Number3, Summer 1999.

16. J.B. Mena and J.A. Malpica, Color image seg-mentation based on three levels of texture sta-tistical evaluation, Applied Mathematics andComputation, Vol. 161 (2005), 1-17.

17. A. Bhalerao and R. Wilson, Unsupervisedimage segmentation combining region andboundary estimation, Image and Vision Com-puting, Vol. 19, Issue 6 (2001), 353-368.

18. H. Zang, J.E. Fritts and S.A. Goldman, ACo-evaluation Framework for Improving Seg-mentation Evaluation, SPIE Defense and Se-curity Symposium - Signal Processing SensorFusion, and Target Recognition XIV (2005),420-430.

19. Y.J. Zhang, Evaluation and comparison ofdifferent segmentation algorithms, PatternRecognition Letters, Vol. 18, Issue 10 (1997),963-974.

20. Y. Yitzhaky and E. Peli, A Method for Ob-jective Edge Detection Evaluation and Detec-tor Parameter Selection, IEEE Transactionson Pattern Analysis and Machine Intelligence,Vol. 25, No 8 (2003), 1027-1033.

21. R. Roman-Roldan, J.F. Gomez-Lopera, C.Atae-allah, J. Martınez-Aroza and P.L.Luque-Escamilla, A measure of quality forevaluating methods of segmentation and edgedetection, Pattern Recognition, Vol. 34, Issue5 (2001), 969-980.

22. M. Bouet, C. Djeraba, A. Khenchaf, andH. Briand, Shape Processing and Image Re-trieval, Revue de l’Objet, special edition,Hermes Science Publications, Vol. 6, Issue 2(2000), 145-170.

23. T. Peli and D. Malah, A Study of Edge De-tection Algorithms, Computer Graphics andImage Processing, Vol. 20 (1982), 1-21.

24. T. Kanoungo, M.Y. Jaisimha, J. Palmer andR.M. Haralick, A Methodology for Quantita-tive Performance Evaluation of Detection Al-gorithms, IEEE Transactions on Image Pro-













































https://www.researchgate.net/publication/222542183_A_measure_of_quality_for_evaluating_methods_of_segmentation_and_edge_detection?el=1_x_8&enrichId=rgreq-705f075d-49ec-472a-8f09-09a4dd896e0f&enrichSource=Y292ZXJQYWdlOzIyMjgyMzE5ODtBUzoxMTM4NDI0MTU3Mzg4ODBAMTQwNDE1MzU1NTM3Nw==



































12 A. Martin et al.

cessing, Vol. 4, N 21 (1995), 1667-1673.25. C. Xu and J.L. Prince, Snakes, Shapes, and

Gradient Vector Flow, IEEE Transactions onImage Processing, Vol. 7, Issue 3 (1998), 359-369.

26. G. Le Chenadec, and J.M. Boucher, SonarImage Segmentation using the Angular De-pendence of Backscattering Distributions,IEEE OCEANS’05 EUROPE, Brest, France,20-23 June 2005.

27. M. Lianantonakis, and Y.R. Petillot, Sidescansonar segmentation using active contours andlevel set methods, IEEE OCEANS’05 EU-ROPE, Brest, France, 20-23 June 2005.

28. C.C. Chang, and C.J. Lin, Lib-svm: a library for support vec-tor machines, Software available athttp://www.csie.ntu.edu.tw/cjlin/libsvm,2001.

http://www.csie.ntu.edu.tw/cjlin/libsvm

Evaluation for uncertain image classification and segmentation

Documents