Evaluation of optimization metaheuristics in clustering

Submitted to Entropy. Pages 1 - 14.OPEN ACCESS

entropyISSN 1099-4300

www.mdpi.com/journal/entropy

Article

PROPOSAL FOR ACO ALGORITHM IMPLEMENTATIONIN CLUSTERING, BASED ON THE TRAVELINGSALESMAN PROBLEMJeffry Chavarría-Molina1, Juan José Fallas-Monge2,* and Javier Trejos-Zelaya3

1 School of Mathematics, Costa Rica Institute of Technology, Cartago, Costa Rica.2 School of Mathematics, Costa Rica Institute of Technology, Cartago, Costa Rica.3 School of Mathematics, University of Costa Rica, San José, Costa Rica.

* Author to whom correspondence should be addressed; jfallas@ itcr.ac.cr, Tel:+(506) 2550-2034, Fax:+(506) 2550-2225.

Version November 29, 2014 submitted to Entropy. Typeset by LATEX using class file mdpi.cls

Abstract: We propose an ant colony optimization approach for partitioning a set of objects.1

In order to minimize the intra-variance of the partioned classes, we construct ant-like2

solutions by a constructive approach that selects objects to be put in a class with a probability3

that depends on the distance between the object and the centroid of the class (visibility) and4

the pheromone trail; this also depends on those distances. We performed a simulation study5

in order to evaluate the method with a Monte Carlo experiment that controls some sensitive6

parameters of the clustering problem. After some tuning of the parameters, encouraging7

results were ontained in nearly all cases.8

Keywords: clustering; ACO; ant colonies; intraclass variance; TSP; heuristics; algorithm;9

simulation.10

MSC classifications: 91C20,62H30,90C5911

JEL classifications: C610,C63012

Version November 29, 2014 submitted to Entropy 2 of 14

1. Introduction13

Cluster analysis, or clustering, deals with finding homogeneous groups of objects such that similarobjects belong to the same class and it is possible to distinguish between objects in different classes.Cluster analysis can be defined as an optimization problem in which a given function consisting of withincluster similitary and among clusters dissimilarities need to be optimized [16,24]. In the numerical case,there is a set of objects Ω = x1,x2, . . . ,xn such that xi ∈ Rp, for all i, that is, the objects are describedby p numerical or quantitative variables. The most widely used criterion [4,8,13] is the minimization ofthe within sum-of-squares, also known as within inertia or variance:

W =1

n

K∑k=1

∑xi∈Ck

‖xi − gk‖2,

whereK is the number of classes or clusters (number fixed a priori), P = (C1, C2, . . . , CK) is a partitionof Ω, and gk is the barycenter or mean vector of Ck. Minimizing W (P ) is equivalent to maximizing thebetween sum-of-squares (between inertia or variance):

B =K∑k=1

|Ck|n‖gk − g‖2,

where g is the overall barycenter and |Ck| is the cardinality of classCk, since the sum I = W (P )+B(P )14

is a constant (the total inertia) [4,8,13].15

The W (P ) function is not a convex function, thus W (P ) could have several local minima [18,19].16

This feature causes the traditional clustering algorithms, such as k-means, to find mostly local minima17

[21]. Furthermore, the global optimization algorithms (such as linear programming, interval methods,18

branch and bound methods, etc) present a high sensitivity to relatively high dimensional data tables,19

in which the algorithms’ probability for finding the optimal clustering is very low. In those cases,20

algorithms report solutions that differ significantly from the optimum clustering [2]. Those features21

represent a challenge to try to find alternative optimization strategies, and combinatorial optimization22

heuristics are a viable option.23

In recent years heuristic algorithms have been used to solve complex optimization problems, since24

their random nature is useful to efficiently avoid the convergence to local minima [1]. As particular25

examples of optimization heuristics it is possible to cite simulated annealing, tabu search, genetic26

algorithms, particle swarm optimization and ant colony optimization.27

The optimization algorithm based on ant colonies (ACO) is part of a large group based on swarm28

intelligence. It was proposed by Marco Dorigo in 1992, to solve several discrete optimization problems29

[9], [16], and since then it has been applied to several combinatorial optimization problems. For example,30

in [6] it is possible to find its application to the traveling salesman problem (TSP) and the quadratic31

assignment problem. Moreover, [9] shows the application of ACO to the job assigment problem and the32

vehicle routing problem.33

Today it is not difficult to find studies and comparisons among data clustering techniques. Several34

papers deal with combinatorial optimization metaheuristics and many of them are based on the ant35

intelligence. For example, [15] analysed the approach between the Lumer and Faieta algorithm (called36


LF model of standard ant clustering algorithm: SACA) and Kohonen’s Self-Organizing Batch Map37

(Batch-SOM, an artificial neural network). In fact, Lumer and Faieta introduced the notion of short term38

memory within each agent (a simulated ant) in [16]; and improved a sorting and clustering method in a39

document retrieval interface, inspired by the behavior of real ants. They proposed a hybridization with a40

pre-processing phase and they show how the time-complexity can be improved.41

In [22], an algorithm based on ant colonies to study the clustering problem is proposed. Ants are42

associated with partitions that are modified during the iterations, according to a selection procedure in43

which objects attract other objects to their cluster with a probability of selection that depends on the44

visibility (proportional to the distance between the objects) and the pheromone trail (which depends on45

the fact that the objects have been classified together in the partitions).46

In the current paper a new proposal to implement the ACO heuristic in the clustering problem context47

is presented, based on the traveling salesman problem. It is a constructive method, in which each ant48

builds a partition following a strategy similar to that made by ants in the TSP originally proposed by [6].49

In Section 2 the artificial ant concept is explained and the ACO classical algorithm is presented.50

Section 3 describes the proposed ACO algorithm. Section 4 describes the experiment performed.51

Sections 5 and 6 present the results and some remarks.52

2. Artificial ant colonies53

In nature, the optimization developed by ants while they look for food consists basically of minimizing54

the distance between the nest and food. For this reason the first application of ACO was to the TSP [6].55

In that problem the agent should visit n cities, all interconnected, visiting all cities just one time and then56

returning to the departure city, minimizing the distance.57

In this paper the TSP idea is used to study the clustering optimization problem. Thus, it is necessary58

to introduce artificial ants; that is, agents in charge of finding a feasible solution in the search space.59

During this proccess the ant will drop artificial pheromones so that other ants can rebuild the same60

solution. Pheromones should be volatile (disappear in time on the trails that have not been intensified)61

and have to increase on the shortest trails while the number of iterations increases [7,9].62

The pheromone update formula applied in the TSP is given by τuv = (1 − ρ)τuv + ρ∆τuv [3,10,11],where τuv is the pheromone present on the trail from u to v, ρ is the evaporation rate, and

∆τuv =M∑m=1

∆τmuv,

where M is the number of ants, and ∆τmuv is the pheromone dropped by the m-th ant on the trail (u, v),normally given by:

∆τmuv =

Q/dm if ant m walks across (u, v)

0 otherwise;

where Q is a parameter to be fitted and dm represents the total distance walked by ant m.63

An alternative way to deal with pheromones is to make local updatings [7], that is, every time an ant64

goes from node u to node v, a local pheromone uptade is applied on the trail (u, v) [10]. A possible local65


update formula is τuv = τuv +Q

duv, where Q is a parameter to be fitted and duv is the distance between u66

and v. When all ants finish their trips, the pheromone is updated by applying the evaporation rate.67

On the other hand, each ant has to decide to which node it goes from the current node. In thatchoice three factors are fundamental: visibility, pheromone trail and a probabilistic factor. Thus, if Tmrepresents the route built by the ant m while it is on the node u, then the probability of going to the nodev is given by:

pmuv =

[τuv]

α · [ηuv]β∑s 6∈Tm

[τus]α[ηus]βif v 6∈ Tm

0 if v ∈ Tm;

where ηuv is the visibility, defined by ηuv = 1/duv, with duv the distance from the node u to node v; τuv68

is the pheromone on the trail (u, v), and α and β are parameters to be fitted [3,7,9,17].69

To stop the algorithm, [6] proposed using a maximum iteration number. The disadvantage of this70

procedure is that it could stop the algorithm while it is still improving the solutions. Also, [12] considered71

investigating a stagnation behavior of all ants traveling the same path. A stagnation process is present if72

a percentage of the ants have the same distance in their paths. Thus, it is almost certain that those ants73

are traveling the same path, or at least, that they are traveling paths with the same cost value.74

In algorithm 1, the classical ACO algorithm is shown.75

Algorithm 1 ACO algorithmRequire: Initial parameters.

1: Put M ants on the nodes, randomly.2: Define a list Tm for ant m, with m = 1, 2, . . . ,M . Initially, the list only has the initial node of antm.

3: Counter← 0.4: while stop criterion is not satisfied do5: Counter← Counter + 1

6: for t← 1 to total of nodes do7: for m← 1 to M do8: Move ant m to a new position.9: Update Tm.

10: Update the local pheromones (optional).11: end for12: end for13: Update the global pheromones.14: Keep the best solution in this iteration if it improves the best in memory.15: end while16: return The best solution built.

3. Description of the proposed ACO algorithm76


The proposed method starts by defining a list ofM artificial ants h1,h2, . . . ,hM , that will build a data77

clustering in K classes (or clusters). At the beginning, it is possible to define the best ant in the colony,78

denoted by h∗, equal to hm for some m = 1, 2, . . . ,M , because in that moment there is no comparison79

parameter among them; thus the assignment could be random.80

For ant hm, with m = 1, 2, . . . ,M , K random points in the space of individuals (a hyperrectangle81

that contains all individuals) are considered, denoted by gm1 ,gm2 , . . . ,g

mK . These points are interpreted82

as the initial centroids. Cmk denotes the class k, with centroid gmk , which has been built by ant m. Also,83

hm has a tabu list Lm, which is a short term memory that contains the objects classified by hm. In each84

iteration , in order to complete the tour, ant m has to classify the objects not in Lm. When the iteration85

is done, all objects should be in Lm, this guarantees that the clustering process is complete.86

During the clustering process, each ant randomly chooses an object that is not in its tabu list. Then,87

the ant should randomly select a class in which to classify the object. If ant m selects object i, then the88

process to choose the class uses a probabilistic roulette (see [20]). The probability that hm assigns object89

i to class Cmk is denoted by pmik. To calculate this probability it is necessary to consider the following90

factors:91

• Visibility: This factor is denoted by ηmik , and it consists of the visibility of hm, located on object92

xi, to “see” class Cmk . The visibility is defined as the reciprocal of the distance from object xi to93

gmk , the centroid of class Cmk . Thus, ηmik := 1

dmik, where dmik = d2(xi,g

mk ) = ‖xi − gmk ‖

2 . If the94

visibility which hm has of class Cmk is large, then the probability of classifying xi in class k is also95

large.96

• The pheromone trail: The pheromone trail perceived by hm on the arc from xi to gmk is denoted97

by τik. It quantifies pheromones that have been dropped by all ants which have classified the same98

object xi in its respective class k. If τik is large, then the probability of assigning class k to cluster99

xi is going to increase.100

Equation (1) shows the formula used to calculate pmik, considering visibility and the pheromone trail,inspired by the corresponding formula used by the agent in the TSP:

pmik :=[τik]

α · [ηmik ]β

K∑r=1

[τir]α · [ηmir ]β, (1)

where α and β are parameters to be fitted.101

On the other hand, when hm chooses class Cmk for object xi, the ant will register index i in the102

respective tabu list Lm. Futhermore, hm should do the following processes related to the assignment.103

• Local pheromone update: Ant hm should drop a pheromone trail between object xi and class104

Cmk . To do this, an auxiliary pheromone matrix was defined, denoted by Γaux with size n × K,105

such that entry ik of Γaux contains pheromones between xi and class k. This matrix has the format106

presented in Table 1.107


Table 1. Auxiliary pheromone matrix.

Γaux =

C1 C2 C3 · · · CK

x1

x2

x3

...xn

Ant hm will drop ∆τmik pheromones. This quantity is defined by ∆τmik :=Q

dmik, where Q is a108

parameter to be fitted. Finally, the local pheromone update is done by adding ∆τmik with the109

current entry ik of Γaux.110

• Centroid update: The final step in this process is to update the centroid gmk of class Cmk . One111

possibility is using its definition gmk := 1

|Cmk |∑x∈Cmk

x. This option is not advisable because there112

are several unnecessary calculations. If fact, it is possible to update gmk recursively using its value113

in the previous iteration in case object xi is transferred to class Cmk . In [23] the following formula114

is proven and is used to update the centroids more efficiently: gmk := 1

|Cmk |[(|Cm

k | − 1)gmk + xi].115

After each ant has clustered one object, it should randomly select a new object that is not in its116

tabu list. Next, the ant should follow the process previously described. This process is done n times,117

clustering all objects by all ants.118

When the process ends, each ant has a complete clustering of objects with the respective barycenters.119

Also, matrix Γaux contains pheromones that were dropped by ants. Entry ik of Γaux contains pheromone120

∆τik, which has been dropped by all ants that classified object i in its respective class k. This quantity is121

represented by ∆τik =M∑m=1

∆τmik .122

The next step is to calculate, for each ant, the within inertia. To do this, the classification done by each123

ant, and the respective barycenters, should be considered. Also, if one of the ants has a within inertia124

less than W (h∗) (the best inertia so far in memory), then h∗ (the best ant in memory) is required to be125

updated.126

Global pheromones are stored in a matrix Γ with the same structure as Γaux. At the beginning, this127

matrix is initialized with values close to zero (indicating pheromone absence). When the travels of all128

ants finish, Γ is updated in entry ik by Γik := (1− ρ)Γik + ρ∆τik, where ρ is the pheromone evaporation129

rate.130

When the pheromone updating process is done, matrix Γaux is initialized, to be used in the next131

iteration. Also, tabu lists (one per ant) are initialized, to start a new classification process.132


As the final step to conclude the current iteration, an intensification process done by the best ant(the ant with lowest within inertia, denoted by h∗) is developed. h∗ repeats her path dropping extrapheromones in arcs it visited. The intensification follows the following rule:

Γik :=

Γik + Q

W (h∗)if the object i is in the class k of h∗,

Γik otherwise;

where W (h∗) denotes the within inertia of the classification done by h∗. This ends the current iteration133

and a new clustering process is started, considering the following information: the global pheromone134

matrix Γ, the barycenters of ants, which will be used as the initial centroids for the new classes, and the135

best ant h∗.136

Algorithm 2 presents a detailed pseudocode of the ACO algorithm based on the TSP. In order to137

accelerate convergence, the k-means algorithm was applied (see line 16 in Algorithm 2) to each ant,138

every ApplyKMeansEach iterations (this is a parameter). The method is applied before all ants have built139

their respective classifications, and until the absolute difference between current inertia and previous140

inertia is less than 0.001. Algorithm 3 shows the hybrid k-means strategy.141

Finally, in the event that there has been no improvement, Algorithm 2 uses an iteration number as142

stopping criterion (see line 4). This process is controlled by a parameter called IterationsWithoutImprov.143

4. Experimentation144

To test Algorithm 2 twenty-four data tables were built, with randomly generated normal variables, and145

according to the following rules:146

• For the number of objects n, the four posibilities n ∈ 105, 525, 1050, 2100 were considered. The number147

of clusters K was in 3, 7.148

• The first 16 data tables were built with n ∈ 105, 525, K ∈ 3, 7, and two levels (see encoding in149

Table 2). In the first level all clusters have the same cardinality (this feauture is denoted by Card(=)). The150

data tables in the second level have one large class (its size is the integer part of n2 ) and the other classes151

have the same cardinality; in this level, this feature is denoted by Card( 6=). In the remaining 8 data tables152

n ∈ 1050, 2100, K = 7, and all classes have different cardinalities.153

• Futhermore, in tables from T1 to T16 two attributes were used. First, clusters were built with variables of154

variance equal to 1 (this feature was codified by SD(=)). Second, one class has variables with variance155

equal to 3 and the remaining K − 1 classes have variances equal to 1 (this feature is denoted by SD( 6=)).156

Finally, data tables from T17 to T24 were built with 7 classes and different variances.157

Table 2 shows the data tables encoding. The value in the third column indicates the W (P ) reference value,158

related to the constructed controlled data clustering for each table (it was experimentaly obtained).159

4.1. Paremeter analysis in ACO160

As parameter ApplyKMeansEach decreases, the perfomance of ACO increases (it was experimentally161

determined that with ApplyKMeansEach = 1 better results were obtained), but the runtime also increases.162


Algorithm 2 ACO based on the TSP.Require: n (number of individuals), p (number of variables), K (number of clusters), M (number of

ants), ApplyKMeansEach , IterationsWithoutImprov, and the parameters α, β, Q and ρ.1: Build the initial colony with m ants: h1, h2, . . . ,hM .2: For m = 1, 2, . . . ,M define Lm = ∅, and randomly choose gm1 , . . . ,g

mK .

3: Counter← 0

4: while IterationsWithoutImprov<MaxIterationsWithoutImprov do5: Counter← Counter + 1

6: for m := 1 to M do7: Ant hm chooses a random individual xi, such that i /∈ Lm.

8: Ant hm chooses k := Roulette(pmik), where pmik :=[τik]

α·[ηmik ]β

K∑r=1

[τir]α·[ηmir ]β.

9: Individual xi and index i are assigned to Cmk and Lk, respectively.

10: Let 〈Γaux〉ik := 〈Γaux〉ik + ∆τmik , where ∆τmik = Qdmik

.11: Let gmk := 1

|Cmk |[(|Cm

k | − 1)gmk + xi].

12: end for13: Let h∗ := BestAnt(h1, . . . ,hM ,h

∗)

14: Let 〈Γ〉ik := τik, where τik := (1− ρ) 〈Γ〉ik + ρ 〈Γaux〉ik.15: Intensify the best trail. For all individuals classified in cluster k of h∗, do 〈Γ〉ik = 〈Γ〉ik + Q

W (h∗).

16: if Counter is divisible by ApplyKMeansEach then17: for m := 1 to M do18: Apply k-means to hm.19: Update h∗ if there was an improvement from the k-means application.20: end for21: end if22: end while23: return h∗

Algorithm 3 k-means strategy applied to ACO.Require: One ant h.

1: PreviousInertia← −1.2: while |PreviousInertia−W (hm)| > 0.001 do3: PreviousInertia← W (hm)

4: For h, build clusters C1, C2, . . . , CK , using barycenters g1, . . . ,gK . To do that, assign eachindividual xi to the class with its barycenter closest to xi.

5: Recalculate the barycenters g1,g2, . . . ,gK with: gk =1

|Ck|∑xi∈Ck

xi, for all k = 1, 2, . . . , K.

6: end while7: return A new ant h.


Table 2. Data tables encoding.

Code Characteristics W (P )

T1 Table-105-K(3)Card(=)SD(=) 5.42201132

T2 Table-525-K(3)Card(=)SD(=) 5.99260746

T3 Table-105-K(7)Card(=)SD(=) 5.14604458

T4 Table-525-K(7)Card(=)SD(=) 5.33900849

T5 Table-105-K(3)Card(=)SD( 6=) 13.15015083

T6 Table-525-K(3)Card(=)SD( 6=) 15.80910254

T7 Table-105-K(7)Card(=)SD( 6=) 9.89537855

T8 Table-525-K(7)Card(=)SD( 6=) 8.26081280

T9 Table-105-K(3)Card(6=)SD(=) 5.00677923

T10 Table-525-K(3)Card(6=)SD(=) 5.67158163

T11 Table-105-K(7)Card(6=)SD(=) 5.54505587

T12 Table-525-K(7)Card(6=)SD(=) 5.64774256

T13 Table-105-K(3)Card(6=)SD( 6=) 11.73403420

T14 Table-525-K(3)Card(6=)SD( 6=) 13.81926250

T15 Table-105-K(7)Card(6=)SD( 6=) 7.62467183

T16 Table-525-K(7)Card(6=)SD( 6=) 7.45610263

T17 Table-1050-K(7)Card(=)SD(=) 5.24360410

T18 Table-2100-K(7)Card(=)SD(=) 5.28462947

T19 Table-1050-K(7)Card(6=)SD(=) 5.30949287

T20 Table-2100-K(7)Card(6=)SD(=) 5.24648236

T21 Table-1050-K(7)Card(=)SD(6=) 22.09188805

T22 Table-2100-K(7)Card(=)SD(6=) 22.56959210

T23 Table-1050-K(7)Card(6=)SD(6=) 14.94658261

T24 Table-2100-K(7)Card(6=)SD(6=) 15.57304438

Parameter MaxIterationsWithoutImprov causes a similar behavior because as it increases the performance of ACO163

increases. This implies that both parameters need not be fitted.164

The ACO algorithm has four parameters that should be fitted, with the aim of achieving good performance.165

Parameters α and β control the relative weights assigned to pheromone concentration and ant visibility,166

respectively. Meanwhile, ρ represents the pheromone evaporation rate, used to update the pheromone matrix.167

Finally, parameter Q is a pheromone amplification constant.168

To develop the parameter analysis tables T15 and T16 were used, and for each table, and for each parameter169

combination, 200 multistart runs were done. In [9] it is recommended to take ρ = 0.5 . Also, [12] shows a170

discrete optimization; for this Q ∈ 1, 100, 10000 and ρ ∈ 0.3, 0.5, 0.7, 0.9, 0.999 were used, and Q = 100171

and ρ = 0.5 were the best values obtained. In the current experiment a wider analysis was developed, using172

ρ ∈ 0.1, 0.2, . . . , 0.9 and Q ∈ 50, 100, 150, . . . , 500.173

On the other hand, a preliminary analysis for the positive numbers α and β was done (positive because they are174

weights), showing that values greater than 6 cause bad performance. For this reason, the parameters analysis took175

α, β ∈ 0, 0.5, 1, 1.5, . . . , 6.176

In total 13× 13× 9× 10 = 15210 combinations were run for each of the tables T15 and T16. The parameter177

analysis used M = 10 (the number of ants), ApplyKMeansEach = 1 and MaxIterationsWithoutImprov = 10. The178

pictures in Figure 1 show some examples of the 90 contour maps built with the performance percentages obtained179

with table T15, for the different parameter combinations. For example, Figure 1(a) shows the contour map for180

ρ = 0.1, Q = 50 and α, β ∈ 0, 0.5, 1, 1.5, . . . , 6. This analysis showed that ρ = 0.5 was the best option,181


because the best performance zone for ρ = 0.5 (the darker red zone in Figure 1(b)) is better, compared to the182

remaining ρ values.183

Figure 1. Some examples of contour maps created with the performance percentages, forQ = 50, ρ = 0.1, 0.5, 0.9, and variants values for α and β. Analysis done with table T15.

Va

ria

ció

n d

e α

Variación de β

0,95-1

0,9-0,95

0,8-0,9

0,7-0,8

0,6-0,7

0,5-0,6

0,4-0,5

0,3-0,4

0,1-0,2

0,0-0,1

0,2-0,3

1

1,5

2

2,5

3

3,5

4

4,5

5

1 1,5 2 2,5 3 3,5 4 4,5 5

(a) Contour map for ρ = 0.1 and Q = 50.

Va

ria

ció

n d

e α

Variación de β

0,95-1

0,9-0,95

0,8-0,9

0,7-0,8

0,6-0,7

0,5-0,6

0,4-0,5

0,3-0,4

0,1-0,2

0,0-0,1

0,2-0,3

1

1,5

2

2,5

3

3,5

4

4,5

5

1 1,5 2 2,5 3 3,5 4 4,5 5

(b) Contour map for ρ = 0.5 and Q = 50.V

ari

ac

ión

de

α

Variación de β

0,95-1

0,9-0,95

0,8-0,9

0,7-0,8

0,6-0,7

0,5-0,6

0,4-0,5

0,3-0,4

0,1-0,2

0,0-0,1

0,2-0,3

1

1,5

2

2,5

3

3,5

4

4,5

5

1 1,5 2 2,5 3 3,5 4 4,5 5

(c) Contour map for ρ = 0.9 and Q = 50.

A particular behavior was present in this experiment with the parameter Q. Very similar contour maps were184

obtained when ρwas fixed, andQ varied from 50 to 500 (10 contour maps per each ρ value). This showed evidence185

that Q was not an important parameter in this experiment. For example, the 10 contour maps created with ρ = 0.1186

and Q ∈ 50, 100, 150, . . . , 500 were very similar. To proof this hypothesis, a linear regression model was used,187

and it permitted to conclude that there was not a significant difference among the 10 contour maps for one fixed188

value for ρ. The same behavior occurred for several ρ. Therefore, the parameter Q was fixed in 250 (the middle189

value), but the remaining 9 values could also be used.190

Next, an analysis for α and β was developed with tables T15 and T16, using ρ = 0.5, Q = 250, and α, β ∈191

0, 0.25, 0.5, 0.75, . . . , 6. Figure 2 shows the contour maps obtained in this process. This analysis was not192

enough to determine optimum values for α and β. Figures 2(a) and 2(b) only suggest that the best performance is193

probably obtained when β ∈ [1.5, 5] and α ∈ [0, 3]. For this reason, an extra analysis was developed with table194

T22 (n = 2100). Figure 3 shows the results, and permitted to conclude that α = 0.25 was the best option. Finally,195

based on the results showed in Figures 2(a), 2(b) and 3, β was defined with the value 2.5. Table 3 summarizes the196

selected parameters; these parameters were used to determine the numerical results present in Section 5.197


Table 3. Selected parameters.

Parameter Choosen valueα 0.25

β 2.5

ρ 0.5

Q 250

Figure 2. Contour maps created with the performance percentages, with the fixed valuesρ = 0.5 and Q = 250.

α v

ari

ati

on

β variation

0.95-1

0.9-0.95

0.8-0.9

0.7-0.8

0.6-0.7

0.5-0.6

0.4-0.5

0.3-0.4

0.1-0.2

0.0-0.1

0.2-0.3

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

5.5

6

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6

(a) Results obtained with table T15.α

vari

ati

on

β variation

0.95-1

0.9-0.95

0.8-0.9

0.7-0.8

0.6-0.7

0.5-0.6

0.4-0.5

0.3-0.4

0.1-0.2

0.0-0.1

0.2-0.3

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5

5.5

6

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6

(b) Results obtained with table T16.

Figure 3. Contour maps created with the performance percentages, with the fixed valuesρ = 0.5 and Q = 250, in table T22.

α v

ari

ati

on

β variation

0.95-1

0.9-0.95

0.8-0.9

0.7-0.8

0.6-0.7

0.5-0.6

0.4-0.5

0.3-0.4

0.1-0.2

0.0-0.1

0.2-0.3

0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50

0.5

1

1.5

2

2.5

5. Results and discussion198

Table 4 presents the numerical results obtained in tables T1 to T24. The average time represents, on average,199

how long it took the algorithm to achieve the reference W (P ) value in 500 multistarts. In all cases 100%200


performance was obtained in the 500 multistarts. In some tables other values for ApplyKMeansEach (different201

to 1) were used, because for those tables it was easier to determine the optimum clustering, which allowed to202

report lower average times. A similar behavior was observed for parameter MaxIterationsWithoutImprov.203

Table 4. Numerical results obtained in tables from T1 to T24 are shown. In all tables 100%performance was obtained.

Code Average M MaxIterations ApplyKMeans

time (s) WithoutImprov Each

T1 0.001255 2 1 2T2 0.006234 2 1 2T3 0.035185 8 7 1T4 0.057330 4 4 1T5 0.006433 4 4 1T6 0.084180 7 6 1T7 0.272318 30 15 1T8 0.025142 3 1 1T9 0.002528 3 2 1

T10 0.005464 1 2 1T11 0.037757 7 7 1T12 0.109706 6 5 1

Code Average M MaxIterations ApplyKMeans

time (s) WithoutImprov Each

T13 0.002924 3 2 1T14 0.006724 2 1 1T15 0.049283 8 9 1T16 0.190796 9 5 1T17 0.154550 5 4 1T18 0.284170 5 4 1T19 5.634502 30 15 1T20 5.924223 20 10 1T21 0.235029 6 4 1T22 0.360169 6 4 1T23 0.312384 5 5 1T24 2.359212 10 8 1

On the other hand, in order to evaluate the proposed algorithm with different, not controlled data sets, four204

classical data tables were used to do an additional test (scholar notes, Amiard’s fishes, Thomas’ sociomatrix and205

Fisher’s iris). Table 5 shows the average times reported by the algorithm in the analysis. For each table several K206

values were explored. In all cases 100% performance was obtained in the 500 multistarts, reaching the best known207

solution, except for Thomas’ sociomatrix, with K = 4 and K = 5, which had 99.8% and 99.6% performance,208

respectively.209

Table 5. Numerical results obtained in the extra analysis done with the ACO algorithmapplied to classical data tables.

Data tables K W (P ) Average time (s)Scholar notes 2 28.19027778 0.000691

3 16.81481481 0.0023544 10.46759259 0.0051475 4.88888889 0.002763

Amiard’s fishes 2 69368.27433 0.0013993 32213.38171 0.0011874 18281.38872 0.0027945 14497.81048 0.027373

Thomas’ sociomatrix 2 333.767483 0.0096353 271.832639 0.0500664 235.025694 0.3928205 202.58125 2.046682

Fisher’s iris 2 0.99916903 0.001.2513 0.52136424 0.0043344 0.37817193 0.0257625 0.31204311 0.038542


6. Conclusions210

We have presented a new clustering method based on the ant colony optimization metaheuristic. The method211

is based on some features developed for ACO in the traveling salesman problem. The adaptation to the clustering212

problem takes into account the representation of clusters by barycenters, and therefore the distance between objects213

and barycenters is used for defining visibility and the pheromone trail.214

After a parameter fitting, an extensive experimentation was implemented in order to evaluate the method. It215

performed very well, attaining the desired global optimum in all tables from T1 to T25, in a very short time.216

Furthermore, the method showed very good results when it was applied to classical data tables.217

Also, the experiment revealed the parameter Q does not have a relevant role in the ACO algorithm, but it is218

very sensitive to the values assigned to the parameters α, β and ρ. The parameter fitting process was necessary219

to improve the algorithm performance and it allowed to select α = 0.25, β = 2.5 and ρ = 0.5 as the best220

combination.221

Acknowledgments222

J. Chavarría and J. Fallas were granted with project 5402-1440-3901 of the Research Vicerectory, Costa Rica223

Institute of Technology. J. Trejos was granted with project 821-B1-122 of the Research Vicerectory, University of224

Costa Rica.225

Conflicts of Interest226

“The authors declare no conflict of interest”.227

References228

1. Babu, G.P.; Murty, M.N. Clustering with evolution strategies. Pattern Recognition, 1994, 27(2), 321-329.229

2. Bagirov, A.M.; Mardaneh, K. Modified global k-means algorithm for clustering in gene expression data sets.230

WISB’06 Proceedings of the 2006 Workshop on Intelligent Systems for Bioinformatics - Volume 73, bf 2006,231

pp. 23-28.232

3. Barcos, L.; Rodriguez, V.M.; Alvarez, M.J.; Robusté, F. Routing design for less-than-truckload motor carriers233

using ant colony techniques. Business Economics Series, 2004, 14, Working Paper, 4-38.234

4. Bock, H.-H. Automatische Klassifikation. Vandenhoeck & Ruprecht: Göttingen, Germany, 1974.235

5. Bonabeau, E.; Dorigo, M.; Therauluz, G. Swarm Intelligence. From Natural to Artificial Systems. Oxford236

University Press: New York, 1999.237

6. Bonabeau; Dorigo; Theraulaz Bonabeau, E.; Dorigo, M.; Therauluz, G. Swarm Intelligence. From Natural to238

Artificial Systems. Oxford University Press: New York, USA, 1999.239

7. delosCobos, S.; Goddard, J.; Gutierrez, M.A.; Martinez, A. Búsqueda y Exploración Estocástica. Universidad240

Autónoma Metropolitana: Mexico City, Mexico, 2010.241

8. Diday, E.; Lemaire, J.; Pouget, J.; Testu, F. Élments dâAZAnalyse des Données. Dunod: Paris, France, 1982.242

9. Dorigo, M.; Di Caro, G.; Gambardella, L.M. Ant algorithms for discrete optimization. Artificial Life, 1994,243

5(2), 137-172.244

10. Dorigo, M.; Gambardella, L.C. Ant colony system: a cooperative learning approach to the traveling salesman245

problem. IEEE Trans. on Evolutionary Computation, 1997, 1(1), 53-66.246

11. Dorigo, M.; Birattari, M.; Stutzle, T. Ant colony optimization: Artificial ants as a computational intelligence247

technique. IEEE Computational Intelligence Magazine, 2006, 1(4), 28-39.248


12. Dorigo, M.; Maniezzo, V.; Colorni A. The Ant System: Optimization by a colony of cooperating agents.249

IEEE Transactions on Systems, Man, and Cybernetics-Part B, 1996, 26(1), 1-13.250

13. Everitt, B.S. Cluster analysis, 3rd edition. Edward Arnold: London UK, 1993.251

14. Handl, J.; Meyer, B. Improved ant-based clustering and sorting in a document retrieval interface, in252

Proceedings of PPSN VII, Seventh International Conference on Parallel Problem Solving from Nature, J. J.253

Merelo et al.,Eds.; Lecture Notes in Computer Science 2439, Springer, Berlin, Germany, 2002; pp. 913-923.254

15. Herrmann, L.; Ultsch, A. The architecture of ant-based clustering to improve topographic mapping, in ANTS255

2008, M. Dorigo et al., Eds.; Lecture Notes in Computer Science 5217, Springer, Berlin, Germany, 2008; pp.256

379-386.257

16. Jafar, O.M.; Sivakumar, R. Ant-based clustering algorithms: a brief survey. International Journal of258

Computer Theory and Engineering, 2010, 2(5), 787-796.259

17. Kennedy, J.; Eberhart, R.C. Intelligent Swarm Systems. Academic Press: New York, USA, 2000.260

18. Ng, M.K.; Wong, J.C. Clustering categorical data sets using tabu search techniques. Pattern Recognition,261

2002, 35(12), 2783-2790.262

19. Sarkar, M.; Yegnanarayana, B.; Khemani, D. A clustering algorithm using an evolutionary263

programming-based approach. Pattern Recognition Letters, 1997, 18(10), 975-986.264

20. Talbi, E.G. Metaheuristics: From Design to Implemetation. John Wiley and Sons: Hoboken, New Jersey,265

USA, 2009.266

21. Trejos, J., Murillo, A„ Piza, E. Global stochastic optimization for partitioning. In Advances in Data Science267

and Classification; Rizzi, A., Vichi, M., Bock, H.H., Eds.; Springer: Berlin, Germany, 1998; pp. 185-190.268

22. Trejos, J.; Murillo, A.; Piza, E.: Clustering by ant colony optimization. In Classification, Clustering, and269

Data Mining Applications; D. Banks, L. House, F.R. McMorris, P. Arabie, W. Gaul, Eds.; Springer: Berlin,270

Germany, 2004; pp. 25-32.271

23. Trejos, J.; Castillo, W.; GonzÃalez, J. (2014) Análisis Multivariado de Datos. Métodos y Aplicaciones.272

Editorial de la Univesidad de Costa Rica, San José, Costa Rica.273

24. Xavier, A.E.; Xavier, V.L. Solving the minimum sum-of-squares clustering problem by hyperbolic smoothing274

and partition into boundary and gravitational regions. Pattern Recognition, 2011 44, 70-77.275

c© November 29, 2014 by the authors; submitted to Entropy for possible open access publication under the terms276

and conditions of the Creative Commons Attribution license http://creativecommons.org/licenses/by/4.0/.277

Evaluation of optimization metaheuristics in clustering

Documents