Jan 07, 2016
Evolution of multi-adaptive disretization
intervals for a rule-based geneti learning system
Jaume Baardit and Josep Maria Garrell
Intelligent Systems Researh Group
Enginyeria i Arquitetura La Salle,
Universitat Ramon Llull,
Psg. Bonanova 8, 08022-Barelona,
Catalonia, Spain, Europe. fjbaardit,josepmggsalleURL.edu
Abstrat. Geneti Based Mahine Learning (GBML) systems tradi-
tionally have evolved rules that only deal with disrete attributes. There-
fore, some disretization proess is needed in order to teal with real-
valued attributes. There are several methods to disretize real-valued
attributes into a nite number of intervals, however none of them an
eiently solve all the possible problems. The alternative of a high num-
ber of simple uniform-width intervals usually expands the size of the
searh spae without a lear performane gain. This paper proposes a
rule representation whih uses adaptive disrete intervals that split or
merge through the evolution proess, nding the orret disretization
intervals at the same time as the learning proess is done.
1 Introdution
The appliation of Geneti Algorithms (GA) [10, 8 to lassiation problems is
usually known as Geneti Based Mahine Learning (GBML), and traditionally it
has been addressed from two dierent points of view: the Pittsburgh approah,
and the Mihigan approah, early exemplied by LS-1 [20 and CS-1 [11, re-
spetively.
The lassial knowledge representation used in these systems is a set of rules
where the anteedent is dened by a prexed nite number of intervals to handle
real-valued attributes. The performane of these systems is tied to the right
eletion of the intervals.
In this paper we use a rule representation with adaptive disrete intervals.
These intervals are splitted and merged through the evolution proess that drives
the training stage. This approah avoids the higher omputational ost of the
approahes whih work diretly with real values and nds a good disretiza-
tion only expanding the searh spae with small intervals when neessary. This
representation was introdued in [1 and the work presented in this paper is
its evolution, mainly foused on generalizing the approah and simplifying the
tuning needed for eah domain.
This rule representation is ompared aross dierent domains against the
traditional disrete representation with xed intervals. The number and size of
the xed intervals approah is obtained with two methods: (1) simple uniform-
width intervals and (2) intervals obtained with the Fayyad & Irani method [7,
a well-known disretization algorithm. The aim of this omparison is two-fold:
measure the a
uray performane and the omputational ost.
The paper is strutured as follows. Setion 2 presents some related work.
Then, we desribe the framework of our lassier system setion 3. The adaptive
intervals rule representation is explained in setion 4. Next, setion 5 desribes
the test suite used in the omparison. The results obtained are summarized in
setion 6. Finally, setion 7 disusses the onlusions and some further work.
2 Related work
There are several approahes to handle real-valued attributes in the Geneti
Based Mahine Learning (GBML) eld. Early approahes use disrete rules with
a large number of prexed uniform disretization intervals. However, this ap-
proah has the problem that the searh spae grows exponentially, slowing the
evolutionary proess without a lean a
uray improvement of the solution [2
Lately, several alternatives to the disrete rules have been presented. There
are rules omposed by real-valued intervals (XCSR [22, [4, COGITO [18).
MOGUL [5, uses a fuzzy reasoning method. This method generates sequentially:
(1) fuzzy rules, and then (2) fuzzy membership funtions. Reently, GALE [15
proposed a knowledge independent method for learning other knowledge rep-
resentations like instane sets or deision trees. All those alternatives present
better performane, but usually they also have higher omputational ost [18.
A third approah is to use a heuristi disretization algorithm. Some of
these methods work with information entropy [7, the
2
statisti [14 or multi-
dimensional non-uniform disretization [13. These algorithms are usually more
a
urate and faster than the uniform disretization. However, they suer a lak
of robustness aross some domains [1.
3 Framework
In this setion we desribe the main features of our lassier system. GAssist
(Geneti Algorithms based laSSIer sySTem) [9 is a Pittsburgh style lassi-
er system based on GABIL [6. Diretly from GABIL we have borrowed the
representation of the disrete rules (rules with onjuntive normal form (CNF)
prediates), the semantially orret rossover operator and the tness ompu-
tation (squared a
uray).
Mathing strategy: The mathing proess follows a \if ... then ... else if ... then..."
struture, usually alled Deision List [19.
Mutation operators: The system manipulates variable-length individuals, mak-
ing more diult the tuning of the lassi gene-based mutation probability. In
order to simplify this tuning, we dene p
mut
as the probability i of mutating
an individual. When an individual is seleted for mutation (based on p
mut
), a
random gene is hosen inside its hromosome for mutation.
Control of the individuals length: Dealing with variable-length individuals arises
some serious onsiderations. One of the most important ones is the ontrol of the
size of the evolving individuals [21. This ontrol is ahieved ins GAssist using
two dierent operators:
{ Rule deletion This operator deletes the rules of the individuals that do not
math any training example. This rule deletion is done after the tness om-
putation and has two onstraints: (a) the proess is only ativated after a
predened number of iterations, to prevent a massive diversity loss and (b)
the number of rules of an individual never goes below a lower threshold. This
threshold is assigned to the number of lasses of the domain.
{ Seletion bias using the individual size Seletion is guided as usual by the
tness (the a
uray). However, it also gives ertain degree of relevane to
the size of the individuals, having a poliy similar to multi-objetive systems.
We use tournament seletion beause its loal behavior lets us implement
this poliy. The riterion of the tournament is given by an operator alled
\size-based omparison" [2. This operator onsiders two individuals similar
if their tness dierene is below a ertain threshold (d
omp
). Then, it selets
the individual with fewer number of rules.
4 Disrete rules with adaptive intervals
This setion desribes the rule representation based on disrete rules with adap-
tive intervals. First we desribe the problems that traditional disrete rules
present. Then, we explain the adaptive intervals rules proposed and the hanges
introdued in order to enable the GA to use them.
4.1 Disrete rules and unneessary searh spae growth
The traditional approah to solve problems with real-valued attributes using
disrete rules has been done using a disretization proess. This disretization
an be done using algorithms whih determine the disretization intervals ana-
lyzing the training information or we an use a simple alternative like using an
uniform-width intervals disretization.
In the latter method, the way to inrease the a
uray of the solution is to
inrease the number of intervals. This solution brings a big problem beause the
searh spae to explore grows in an exponential degree when more intervals are
added. The improvement in a
uray expeted inreasing the number of intervals
does not exist sometimes, beause the GA spends too muh time exploring areas
of the searh spae whih do not need to be explored.
If we nd a orret and minimal set of intervals, the solution a
uray will
probably inrease without a huge inrease of the omputational ost.
4.2 Finding good and minimal intervals
Our aim is to nd good disretization intervals without a great expansion of the
searh spae. In order to ahieve this goal we dened a rule representation [1
with disrete adaptive intervals where the disretization intervals are not xed.
These intervals are evolved through the iterations, merging and splitting between
them.
To ontrol the omputational ost and the growth of the searh spae, we
dene the next onstraints:
{ A number of \low level" uniform and stati intervals is dened for eah
attribute alled miro-intervals.
{ The adaptive intervals are built joining together miro-intervals.
{ When we split an interval, we selet a random point in its miro-intervals
to break it.
{ When we merge two intervals, the value of the resulting interval is taken
from the one whih has more miro-intervals. If both have the same number
of miro-intervals, the value is hosen randomly.
{ The number and size of the initial intervals is seleted randomly.
The adaptive intervals as well as the split and merge operators are shown in
gure 1.
Fig. 1. Adaptive intervals representation and the split and merge operators.
Rule set
ClassRule
11 1 0Interval valueAttribute
Microinterval { Interval
Attribute
MergeSplit
11 1 0
Interval to mutate
1 1 0 0 1
Cut point Neighbour selected to merge
1 1 0
To apply the split and merge operators we have added to the GA yle
two speial phases applied to the ospring population after the mutation phase.
For eah phase (split and merge) we have a probability (p
split
or p
merge
) of
applying a split or merge operation to an individual. If an individual is seleted
for splitting or merging, a random point inside its hromosome is hosen to apply
the operation.
Finally, this representation requires some hanges in some other parts of the
GA:
{ The rossover operator an only take plae in the attribute boundaries.
{ The \size-based omparison" operator uses the length (number of genes) of
the individual instead of the number of rules, beause now the size of a rule
an hange when the number of intervals that it ontains hange. This hange
also makes the GA prefer the individuals with fewer intervals in addition to
fewer rules, further simplifying them.
4.3 Changes to the adaptive intervals rule representation
One of the main drawbaks of the initial approah was the sizing of the number
of miro-intervals assigned to eah attribute term of the rules. This parameter
is diult to tune beause it is domain-spei.
In this paper we test another approah (multi-adaptive) whih onsists in
evolving attribute terms with dierent number of miro-intervals in the same
population. This enables the evolutionary proess to selet the orret number
of miro-intervals for eah attribute term of the rules. The number of miro-
intervals of eah attribute term is seleted from a predened set in the initial-
ization stage.
The initialization phase has also hanged. In our previous work the number
and size of the intervals was uniform. We have hanged this poliy to a total
random initialization in order to gain diversity in the initial population.
The last hange introdued involves the split and merge operators. In the
previous version these operators were integrated inside the mutation. This made
the sizing of the probabilities very diult beause the three operators (split,
merge and mutation) were oupled. Using an extra reombination stage in this
version we eliminate this tight linkage.
5 Test suite
This setion summarizes the tests done in order to evaluate the a
uray and
eieny of the method presented in this paper. We also ompare it with some
alternative methods. The tests were onduted using several mahine learning
problems whih we also desribe.
5.1 Test problems
The seleted test problems for this paper present dierent harateristis in order
to give us a broad overview of the performane of the methods being ompared.
The rst problem is a syntheti problem (Tao [15) that has non-orthogonal
lass boundaries. We also use several problems provided by the University of Cal-
ifornia at Irvine (UCI) repository [3. The problems seleted are: Pima-indians-
diabetes (pima), iris, glass and breast-aner-winsonsin (breast). Finally we will
use three problems from our own private repository. The rst two deal with the
diagnosis of breast aner based of biopsies (bps [17) and mammograms (mamm
[16) whereas the last one is related to the predition of student qualiations
(lrn [9). The harateristis of the problems are listed in table 1. The partition
of the examples into the train and test sets was done using the stratied ten-fold
ross-validation method [12.
5.2 Congurations of the GA to test
The main goal of the tests are to evaluate the performane of the adaptive inter-
vals rules representation. In order to ompare this method with the traditional
Table 1. Charateristis of the test problems.
Dataset Number of examples real attributes disrete attributes lasses
tao 1888 2 - 2
pima 768 8 - 2
iris 150 4 - 3
glass 214 9 - 6
breast 699 - 9 2
bps 1027 24 - 2
mamm 216 21 - 2
lrn 648 4 2 5
disrete representation, we use two disretization methods, the simple uniform-
width intervals method and the Fayyad & Irani method [7.
We analyze the adaptive intervals approah with two types of runs. The rst
one assigns the same number of miro-intervals to all the attribute terms of
the individuals. We all this type of run adaptive. In the seond one, attributes
with dierent number of miro-intervals oexist in the same population. We well
all this type multi-adaptive.
The GA parameters are shown in table 2. The reader an appreiate that the
sizing of both p
split
and p
merge
is the same for all the problems exept the tao
problem. Giving the same value to p
merge
and p
split
produe solutions with too
few rules and intervals, as well as less a
urate than the results obtained with
the onguration shown in table 2. This is an issue that needs further study.
Another important issue of the p
split
and p
merge
probabilities for some of
the domains is that they are greater than 1. This means that for these domains
at least one split and merge operation will be surely done to eah individual
of the population. Thus, p
split
and p
merge
beome expeted values instead of
probabilities. The tuning done produes a redution of the number of iterations
needed.
6 Results
In this setion present the results obtained. The aim of the tests was to ompare
the method presented in the paper in three aspets: a
uray and size of the
solutions as well as the omputational ost. Foreah method and test problem
we show the average and standard deviation values of: (1) the ross-validation
a
uray, (2) the size of the best individual in number of rules and intervals per
attribute and (3) the exeution time in seonds. The tests were exeuted in an
AMD Athlon 1700+ using Linux operating system and C++ language.
The results were also analyzed using the two-sided t-test [23 to determine if
the two adaptive methods outperform the other ones with a signiane level of
1%. Finally, for eah onguration, test and fold, 15 runs using dierent random
seeds were done. Results are shown in table 3. The olumn titled t-test show a
beside the Uniform or Fayyad & Irani method if it was outperformed by the
Table 2. Common and problem-spei parameters of the GA.
Parameter Value
Crossover probability 0.6
Iter. of rule eliminating ativation 30
Iter. of size omparison ativation 30
Sets of miro-intervals in the multi-adaptive test 5,6,7,8,10,15,20,25
Tournament size 3
Population size 300
Probability of mutating an individual 0.6
Code Parameter
#iter Number of GA iterations
d
interv
Number of intervals in the uniform-width disrete rules
a
interv
Number of miro-intervals in the adaptive test
d
omp
Distane parameter in the \size-based omparison" operator
p
split
Probability of splitting an individual (one of its intervals)
p
merge
Probability of merging an individual (one of its intervals)
Problem Parameter
#iter d
interv
a
interv
d
omp
p
merge
p
split
tao 600 12 48 0.001 1.3 2.6
pima 500 4 8 0.01 0.8 0.8
iris 400 10 10 0.02 0.5 0.5
glass 750 4 8 0.015 1.5 1.5
breast 325 5 10 0.01 3.2 3.2
bps 500 4 10 0.015 1.7 1.7
mamm 500 2 5 0.01 1 1
lrn 700 5 10 0.01 1.2 1.2
adaptive methods. The adaptive methods were never outperformed in the tests
done, showing a good robustness.
The results are summarized using the ranking in table 4. The ranking for
eah problem and method is based on the a
uray. The global i rankings are
omputed averaging the problem rankings.
Table 3 shows that in two of the tests the best performing method was the
Fayyad & Irani interval disretization tehnique. However, in the rest of the tests
its performane is lower, showing a lak of robustness aross dierent domains.
The two adaptive tests ahieved the best results of the ranking. Nevertheless, the
goal of improving the rule representation with the multi-adaptive onguration
has not been ahieved. It is only better than the original adaptive onguration
in three of the eight test problems. The omputational ost is learly the main
drawbak of the adaptive intervals representation. The Fayyad & Irani method
is in average 2.62 times faster than it.
7 Conlusions and further work
This paper foused on an adaptive rule representation as a a robust method
for nding a good disretization. The main ontribution done is provided by
the used of adaptive disrete intervals, whih an split or merge through the
evolution proess, reduing the searh spae where it is possible.
The use of a heuristi disretization method (like the Fayyad & Irani one)
outperform the adaptive intervals representation in some test problem. Never-
Table 3. Mean and deviation of the a
uray (perentage of orretly lassier ex-
amples), number of rules, intervals per attribute and exeution time for eah method
tested. Bold entries show the method with best results for eah test problem. A mark
a signiant out-performane based on a t-test
Problem Conguration A
uray Number of Rules Intervals per Rule Time t-test
tao
Uniform 93.71.2 8.81.6 8.30.0 36.03.5
Fayyad 87.81.1 3.10.3 3.40.1 24.21.4
Adaptive 94.61.3 22.55.6 7.70.4 96.614.7
Multi-Adaptive 94.31.0 19.54.9 6.00.6 94.513.9
pima
Uniform 73.84.1 6.32.2 3.70.0 23.22.8
Fayyad 73.63.1 6.62.6 2.30.2 26.43.0
Adaptive 74.83.5 6.22.6 2.00.4 56.29.4
Multi-Adaptive 74.43.1 5.82.2 1.90.4 59.78.9
iris
Uniform 92.92.7 3.81.1 8.20.0 5.20.7
Fayyad 94.23.0 3.20.6 2.80.1 5.50.1
Adaptive 94.92.3 3.30.5 1.30.2 9.20.4
Multi-Adaptive 96.22.2 3.60.9 1.30.2 9.00.8
glass
Uniform 60.58.9 8.71.8 3.70.0 13.91.5
Fayyad 65.76.1 8.11.4 2.40.1 14.01.1
Adaptive 64.64.7 5.91.7 1.70.2 35.15.2
Multi-Adaptive 65.24.1 6.72.0 1.80.2 38.45.0
breast
Uniform 94.82.6 4.82.5 4.60.0 6.51.0
Fayyad 95.21.8 4.10.8 3.60.1 5.80.4
Adaptive 95.42.3 2.71.0 1.80.2 15.72.1
Multi-Adaptive 95.32.3 2.60.9 1.70.2 17.41.5
bps
Uniform 77.63.3 15.07.0 3.90.0 50.89.0
Fayyad 80.03.1 7.13.8 2.40.1 37.76.0
Adaptive 80.33.5 4.73.0 2.10.4 106.621.1
Multi-Adaptive 80.13.3 5.12.0 2.00.3 115.920.5
mamm
Uniform 63.29.9 2.60.5 2.00.0 7.81.0
Fayyad 65.311.1 2.30.5 2.00.1 8.50.7
Adaptive 65.85.3 4.41.7 1.80.2 27.64.9
Multi-Adaptive 65.06.1 4.41.9 1.90.2 27.45.5
lrn
Uniform 64.74.9 17.85.1 4.90.0 29.24.0
Fayyad 67.55.1 14.35.0 4.40.1 26.53.4
Adaptive 66.14.6 14.04.6 3.60.3 58.97.9
Multi-Adaptive 66.74.1 11.64.1 3.40.2 53.97.2
Table 4. Performane ranking of the tested methods. Lower number means better
ranking.
Problem Fixed Fayyad Adaptive Multi-Adaptive
tao 3 4 1 2
pima 3 4 1 2
iris 4 3 2 1
glass 4 1 3 2
breast 4 3 1 2
bps 4 3 1 2
mam 4 2 1 3
lrn 4 1 3 2
Average 3.25 2.625 1.625 2
Final rank 4 3 1 2
theless, the performane inrease is not signiant. On the other hand, when
the adaptive intervals outperform the other methods, the performane inrease
is higher, showing a better degree of robustness.
The overhead of evolving disretization intervals and rules at the same time is
quite signiant, being its main drawbak. Beside the ost of the representation
itself (our implementation uses twie the memory of the disrete representation
for the same number of intervals) the main dierene is the signiant redution
of the searh spae ahieved by a heuristi disretization.
Some further work should use the knowledge provided by the disretization
tehniques in order to redue the omputational ost of the adaptive intervals
representation. This proess should be ahieved without losing robustness. An-
other important point of further study is how the value of p
split
and p
merge
aet the behavior of the system, in order to simplify the tuning needed for eah
domain.
Finally, it would also be interesting to ompare the adaptive intervals rule
representation with some representation dealing diretly with real-valued at-
tributes, like the ones desribed in the related work setion. This omparison
should follow the same riteria used here: omparing both the a
uray and the
omputational ost.
Aknowledgments
The authors aknowledge the support provided under grant numbers 2001FI
00514, CICYT/Tel08-0408-02 and FIS00/0033-02. The results of this work were
partially obtained using equipment ofunded by the Dire
io General de Reera
de la Generalitat de Catalunya (D.O.G.C. 30/12/1997). Finally we would like
to thank Enginyeria i Arquitetura La Salle for their support to our researh
group.
Referenes
1. Jaume Baardit and Josep M. Garrell. Evolution of adaptive disretization inter-
vals for a rule-based geneti learning system. In Proeedings of the Geneti and
Evolutionary Computation Conferene (GECCO-2002) (to appear), 2002.
2. Jaume Baardit and Josep M. Garrell. Metodos de generalizaion para sistemas
lasiadores de Pittsburgh. In Proeedings of the \Primer Congreso Iberoameri-
ano de Algoritmos Evolutivos y Bioinspirados (AEB'02)", pages 486{493, 2002.
3. C. Blake, E. Keogh, and C. Merz. Ui repository of mahine learning databases,
1998. Blake, C., Keogh, E., & Merz, C.J. (1998). UCI repository of mahine
learning databases (www.is.ui.edu/mlearn/MLRepository.html).
4. A. L. Cororan and S. Sen. Using real-valued geneti algorithms to evolve rule
sets for lassiation. In Proeedings of the IEEE Conferene on Evolutionary
Computation, pages 120{124, 1994.
5. O. Cordon, M. del Jesus, and F. Herrera. Geneti learning of fuzzy rule-based
lassiation systems o-operating with fuzzy reasoning methods. In International
Journal of Intelligent Systems, Vol. 13 (10/11), pages 1025{1053, 1998.
6. Kenneth A. DeJong and William M. Spears. Learning onept lassiation rules
using geneti algorithms. Proeedings of the International Joint Conferene on
Artiial Intelligene, pages 651{656, 1991.
7. Usama M. Fayyad and Keki B. Irani. Multi-interval disretization of ontinuous-
valued attributes for lassiation learning. In IJCAI, pages 1022{1029, 1993.
8. David E. Goldberg. Geneti Algorithms in Searh, Optimization and Mahine
Learning. Addison-Wesley Publishing Company, In., 1989.
9. Elisabet Golobardes, Xavier Llora, Josep Maria Garrell, David Vernet, and Jaume
Baardit. Geneti lassier system as a heuristi weighting method for a ase-
based lassier system. Butllet de l'Assoiaio Catalana d'Intel.ligenia Artiial,
22:132{141, 2000.
10. John H. Holland. Adaptation in Natural and Artiial Systems. University of
Mihigan Press, 1975.
11. John H. Holland. Esaping Brittleness: The possibilities of General-Purpose Learn-
ing Algorithms Applied to Parallel Rule-Based Systems. In Mahine learning, an
artiial intelligene approah. Volume II, pages 593{623. 1986.
12. Ron Kohavi. A study of ross-validation and bootstrap for a
uray estimation
and model seletion. In IJCAI, pages 1137{1145, 1995.
13. Alexander V. Kozlov and Daphne Koller. Nonuniform dynami disretization in
hybrid networks. In Proeedings of the 13th Annual Conferene on Unertainty in
AI (UAI), pages 314{325, 1997.
14. H. Liu and R. Setiono. Chi2: Feature seletion and disretization of numeri at-
tributes. In Proeedings of 7th IEEE International Conferene on Tools with Arti-
ial Intelligene, pages 388{391. IEEE Computer Soiety, 1995.
15. Xavier Llora and Josep M. Garrell. Knowledge-independent data mining with
ne-grained parallel evolutionary algorithms. In Proeedings of the Geneti and
Evolutionary Computation Conferene (GECCO-2001), pages 461{468. Morgan
Kaufmann, 2001.
16. J. Mart, X. Cuf, J. Reginos, and et al. Shape-based feature seletion for miroal-
iation evaluation. In Imaging Conferene on Image Proessing, 3338:1215-1224,
1998.
17. E. Martnez Marroqun, C. Vos, and et al. Morphologial analysis of mammary
biopsy images. In Proeedings of the IEEE International Conferene on Image
Proessing (ICIP'96), pages 943{947, 1996.
18. Jose C. Riquelme and Jesus S. Aguilar. Codiaion indexada de atributos on-
tinuos para algoritmos evolutivos en aprendizaje supervisado. In Proeedings of
the \Primer Congreso Iberoameriano de Algoritmos Evolutivos y Bioinspirados
(AEB'02)", pages 161{167, 2002.
19. Ronald L. Rivest. Learning deision lists. Mahine Learning, 2(3):229{246, 1987.
20. Stephen F. Smith. Flexible learning of problem solving heuristis through adap-
tive searh. In Proeedings of the 8th International Joint Conferene on Artiial
Intelligene (IJCAI-83), pages 421{425, Los Altos, CA, 1983. Morgan Kaufmann.
21. Terene Soule and James A. Foster. Eets of ode growth and parsimony pressure
on populations in geneti programming. Evolutionary Computation, 6(4):293{309,
Winter 1998.
22. Stewart W. Wilson. Get real! XCS with ontinuous-valued inputs. In L. Booker,
Stephanie Forrest, M. Mithell, and Rik L. Riolo, editors, Festshrift in Honor of
John H. Holland, pages 111{121. Center for the Study of Complex Systems, 1999.
23. Ian H. Witten and Eibe Frank. Data Mining: pratial mahine learning tools and
tehniques with java implementations. Morgan Kaufmann, 2000.