Discretization

Evolution of multi-adaptive disretization

intervals for a rule-based geneti learning system

Jaume Baardit and Josep Maria Garrell

Intelligent Systems Researh Group

Enginyeria i Arquitetura La Salle,

Universitat Ramon Llull,

Psg. Bonanova 8, 08022-Barelona,

Catalonia, Spain, Europe. fjbaardit,josepmggsalleURL.edu

Abstrat. Geneti Based Mahine Learning (GBML) systems tradi-

tionally have evolved rules that only deal with disrete attributes. There-

fore, some disretization proess is needed in order to teal with real-

valued attributes. There are several methods to disretize real-valued

attributes into a nite number of intervals, however none of them an

eiently solve all the possible problems. The alternative of a high num-

ber of simple uniform-width intervals usually expands the size of the

searh spae without a lear performane gain. This paper proposes a

rule representation whih uses adaptive disrete intervals that split or

merge through the evolution proess, nding the orret disretization

intervals at the same time as the learning proess is done.

1 Introdution

The appliation of Geneti Algorithms (GA) [10, 8 to lassiation problems is

usually known as Geneti Based Mahine Learning (GBML), and traditionally it

has been addressed from two dierent points of view: the Pittsburgh approah,

and the Mihigan approah, early exemplied by LS-1 [20 and CS-1 [11, re-

spetively.

The lassial knowledge representation used in these systems is a set of rules

where the anteedent is dened by a prexed nite number of intervals to handle

real-valued attributes. The performane of these systems is tied to the right

eletion of the intervals.

In this paper we use a rule representation with adaptive disrete intervals.

These intervals are splitted and merged through the evolution proess that drives

the training stage. This approah avoids the higher omputational ost of the

approahes whih work diretly with real values and nds a good disretiza-

tion only expanding the searh spae with small intervals when neessary. This

representation was introdued in [1 and the work presented in this paper is

its evolution, mainly foused on generalizing the approah and simplifying the

tuning needed for eah domain.

This rule representation is ompared aross dierent domains against the

traditional disrete representation with xed intervals. The number and size of

the xed intervals approah is obtained with two methods: (1) simple uniform-

width intervals and (2) intervals obtained with the Fayyad & Irani method [7,

a well-known disretization algorithm. The aim of this omparison is two-fold:

measure the a

uray performane and the omputational ost.

The paper is strutured as follows. Setion 2 presents some related work.

Then, we desribe the framework of our lassier system setion 3. The adaptive

intervals rule representation is explained in setion 4. Next, setion 5 desribes

the test suite used in the omparison. The results obtained are summarized in

setion 6. Finally, setion 7 disusses the onlusions and some further work.

2 Related work

There are several approahes to handle real-valued attributes in the Geneti

Based Mahine Learning (GBML) eld. Early approahes use disrete rules with

a large number of prexed uniform disretization intervals. However, this ap-

proah has the problem that the searh spae grows exponentially, slowing the

evolutionary proess without a lean a

uray improvement of the solution [2

Lately, several alternatives to the disrete rules have been presented. There

are rules omposed by real-valued intervals (XCSR [22, [4, COGITO [18).

MOGUL [5, uses a fuzzy reasoning method. This method generates sequentially:

(1) fuzzy rules, and then (2) fuzzy membership funtions. Reently, GALE [15

proposed a knowledge independent method for learning other knowledge rep-

resentations like instane sets or deision trees. All those alternatives present

better performane, but usually they also have higher omputational ost [18.

A third approah is to use a heuristi disretization algorithm. Some of

these methods work with information entropy [7, the

2

statisti [14 or multi-

dimensional non-uniform disretization [13. These algorithms are usually more

a

urate and faster than the uniform disretization. However, they suer a lak

of robustness aross some domains [1.

3 Framework

In this setion we desribe the main features of our lassier system. GAssist

(Geneti Algorithms based laSSIer sySTem) [9 is a Pittsburgh style lassi-

er system based on GABIL [6. Diretly from GABIL we have borrowed the

representation of the disrete rules (rules with onjuntive normal form (CNF)

prediates), the semantially orret rossover operator and the tness ompu-

tation (squared a

uray).

Mathing strategy: The mathing proess follows a \if ... then ... else if ... then..."

struture, usually alled Deision List [19.

Mutation operators: The system manipulates variable-length individuals, mak-

ing more diult the tuning of the lassi gene-based mutation probability. In

order to simplify this tuning, we dene p

mut

as the probability i of mutating

an individual. When an individual is seleted for mutation (based on p

mut

), a

random gene is hosen inside its hromosome for mutation.

Control of the individuals length: Dealing with variable-length individuals arises

some serious onsiderations. One of the most important ones is the ontrol of the

size of the evolving individuals [21. This ontrol is ahieved ins GAssist using

two dierent operators:

{ Rule deletion This operator deletes the rules of the individuals that do not

math any training example. This rule deletion is done after the tness om-

putation and has two onstraints: (a) the proess is only ativated after a

predened number of iterations, to prevent a massive diversity loss and (b)

the number of rules of an individual never goes below a lower threshold. This

threshold is assigned to the number of lasses of the domain.

{ Seletion bias using the individual size Seletion is guided as usual by the

tness (the a

uray). However, it also gives ertain degree of relevane to

the size of the individuals, having a poliy similar to multi-objetive systems.

We use tournament seletion beause its loal behavior lets us implement

this poliy. The riterion of the tournament is given by an operator alled

\size-based omparison" [2. This operator onsiders two individuals similar

if their tness dierene is below a ertain threshold (d

omp

). Then, it selets

the individual with fewer number of rules.

4 Disrete rules with adaptive intervals

This setion desribes the rule representation based on disrete rules with adap-

tive intervals. First we desribe the problems that traditional disrete rules

present. Then, we explain the adaptive intervals rules proposed and the hanges

introdued in order to enable the GA to use them.

4.1 Disrete rules and unneessary searh spae growth

The traditional approah to solve problems with real-valued attributes using

disrete rules has been done using a disretization proess. This disretization

an be done using algorithms whih determine the disretization intervals ana-

lyzing the training information or we an use a simple alternative like using an

uniform-width intervals disretization.

In the latter method, the way to inrease the a

uray of the solution is to

inrease the number of intervals. This solution brings a big problem beause the

searh spae to explore grows in an exponential degree when more intervals are

added. The improvement in a

uray expeted inreasing the number of intervals

does not exist sometimes, beause the GA spends too muh time exploring areas

of the searh spae whih do not need to be explored.

If we nd a orret and minimal set of intervals, the solution a

uray will

probably inrease without a huge inrease of the omputational ost.

4.2 Finding good and minimal intervals

Our aim is to nd good disretization intervals without a great expansion of the

searh spae. In order to ahieve this goal we dened a rule representation [1

with disrete adaptive intervals where the disretization intervals are not xed.

These intervals are evolved through the iterations, merging and splitting between

them.

To ontrol the omputational ost and the growth of the searh spae, we

dene the next onstraints:

{ A number of \low level" uniform and stati intervals is dened for eah

attribute alled miro-intervals.

{ The adaptive intervals are built joining together miro-intervals.

{ When we split an interval, we selet a random point in its miro-intervals

to break it.

{ When we merge two intervals, the value of the resulting interval is taken

from the one whih has more miro-intervals. If both have the same number

of miro-intervals, the value is hosen randomly.

{ The number and size of the initial intervals is seleted randomly.

The adaptive intervals as well as the split and merge operators are shown in

gure 1.

Fig. 1. Adaptive intervals representation and the split and merge operators.

Rule set

ClassRule

11 1 0Interval valueAttribute

Microinterval { Interval

Attribute

MergeSplit

11 1 0

Interval to mutate

1 1 0 0 1

Cut point Neighbour selected to merge

1 1 0

To apply the split and merge operators we have added to the GA yle

two speial phases applied to the ospring population after the mutation phase.

For eah phase (split and merge) we have a probability (p

split

or p

merge

) of

applying a split or merge operation to an individual. If an individual is seleted

for splitting or merging, a random point inside its hromosome is hosen to apply

the operation.

Finally, this representation requires some hanges in some other parts of the

GA:

{ The rossover operator an only take plae in the attribute boundaries.

{ The \size-based omparison" operator uses the length (number of genes) of

the individual instead of the number of rules, beause now the size of a rule

an hange when the number of intervals that it ontains hange. This hange

also makes the GA prefer the individuals with fewer intervals in addition to

fewer rules, further simplifying them.

4.3 Changes to the adaptive intervals rule representation

One of the main drawbaks of the initial approah was the sizing of the number

of miro-intervals assigned to eah attribute term of the rules. This parameter

is diult to tune beause it is domain-spei.

In this paper we test another approah (multi-adaptive) whih onsists in

evolving attribute terms with dierent number of miro-intervals in the same

population. This enables the evolutionary proess to selet the orret number

of miro-intervals for eah attribute term of the rules. The number of miro-

intervals of eah attribute term is seleted from a predened set in the initial-

ization stage.

The initialization phase has also hanged. In our previous work the number

and size of the intervals was uniform. We have hanged this poliy to a total

random initialization in order to gain diversity in the initial population.

The last hange introdued involves the split and merge operators. In the

previous version these operators were integrated inside the mutation. This made

the sizing of the probabilities very diult beause the three operators (split,

merge and mutation) were oupled. Using an extra reombination stage in this

version we eliminate this tight linkage.

5 Test suite

This setion summarizes the tests done in order to evaluate the a

uray and

eieny of the method presented in this paper. We also ompare it with some

alternative methods. The tests were onduted using several mahine learning

problems whih we also desribe.

5.1 Test problems

The seleted test problems for this paper present dierent harateristis in order

to give us a broad overview of the performane of the methods being ompared.

The rst problem is a syntheti problem (Tao [15) that has non-orthogonal

lass boundaries. We also use several problems provided by the University of Cal-

ifornia at Irvine (UCI) repository [3. The problems seleted are: Pima-indians-

diabetes (pima), iris, glass and breast-aner-winsonsin (breast). Finally we will

use three problems from our own private repository. The rst two deal with the

diagnosis of breast aner based of biopsies (bps [17) and mammograms (mamm

[16) whereas the last one is related to the predition of student qualiations

(lrn [9). The harateristis of the problems are listed in table 1. The partition

of the examples into the train and test sets was done using the stratied ten-fold

ross-validation method [12.

5.2 Congurations of the GA to test

The main goal of the tests are to evaluate the performane of the adaptive inter-

vals rules representation. In order to ompare this method with the traditional

Table 1. Charateristis of the test problems.

Dataset Number of examples real attributes disrete attributes lasses

tao 1888 2 - 2

pima 768 8 - 2

iris 150 4 - 3

glass 214 9 - 6

breast 699 - 9 2

bps 1027 24 - 2

mamm 216 21 - 2

lrn 648 4 2 5

disrete representation, we use two disretization methods, the simple uniform-

width intervals method and the Fayyad & Irani method [7.

We analyze the adaptive intervals approah with two types of runs. The rst

one assigns the same number of miro-intervals to all the attribute terms of

the individuals. We all this type of run adaptive. In the seond one, attributes

with dierent number of miro-intervals oexist in the same population. We well

all this type multi-adaptive.

The GA parameters are shown in table 2. The reader an appreiate that the

sizing of both p

split

and p

merge

is the same for all the problems exept the tao

problem. Giving the same value to p

merge

and p

split

produe solutions with too

few rules and intervals, as well as less a

urate than the results obtained with

the onguration shown in table 2. This is an issue that needs further study.

Another important issue of the p

split

and p

merge

probabilities for some of

the domains is that they are greater than 1. This means that for these domains

at least one split and merge operation will be surely done to eah individual

of the population. Thus, p

split

and p

merge

beome expeted values instead of

probabilities. The tuning done produes a redution of the number of iterations

needed.

6 Results

In this setion present the results obtained. The aim of the tests was to ompare

the method presented in the paper in three aspets: a

uray and size of the

solutions as well as the omputational ost. Foreah method and test problem

we show the average and standard deviation values of: (1) the ross-validation

a

uray, (2) the size of the best individual in number of rules and intervals per

attribute and (3) the exeution time in seonds. The tests were exeuted in an

AMD Athlon 1700+ using Linux operating system and C++ language.

The results were also analyzed using the two-sided t-test [23 to determine if

the two adaptive methods outperform the other ones with a signiane level of

1%. Finally, for eah onguration, test and fold, 15 runs using dierent random

seeds were done. Results are shown in table 3. The olumn titled t-test show a

beside the Uniform or Fayyad & Irani method if it was outperformed by the

Table 2. Common and problem-spei parameters of the GA.

Parameter Value

Crossover probability 0.6

Iter. of rule eliminating ativation 30

Iter. of size omparison ativation 30

Sets of miro-intervals in the multi-adaptive test 5,6,7,8,10,15,20,25

Tournament size 3

Population size 300

Probability of mutating an individual 0.6

Code Parameter

#iter Number of GA iterations

d

interv

Number of intervals in the uniform-width disrete rules

a

interv

Number of miro-intervals in the adaptive test

d

omp

Distane parameter in the \size-based omparison" operator

p

split

Probability of splitting an individual (one of its intervals)

p

merge

Probability of merging an individual (one of its intervals)

Problem Parameter

#iter d

interv

a

interv

d

omp

p

merge

p

split

tao 600 12 48 0.001 1.3 2.6

pima 500 4 8 0.01 0.8 0.8

iris 400 10 10 0.02 0.5 0.5

glass 750 4 8 0.015 1.5 1.5

breast 325 5 10 0.01 3.2 3.2

bps 500 4 10 0.015 1.7 1.7

mamm 500 2 5 0.01 1 1

lrn 700 5 10 0.01 1.2 1.2

adaptive methods. The adaptive methods were never outperformed in the tests

done, showing a good robustness.

The results are summarized using the ranking in table 4. The ranking for

eah problem and method is based on the a

uray. The global i rankings are

omputed averaging the problem rankings.

Table 3 shows that in two of the tests the best performing method was the

Fayyad & Irani interval disretization tehnique. However, in the rest of the tests

its performane is lower, showing a lak of robustness aross dierent domains.

The two adaptive tests ahieved the best results of the ranking. Nevertheless, the

goal of improving the rule representation with the multi-adaptive onguration

has not been ahieved. It is only better than the original adaptive onguration

in three of the eight test problems. The omputational ost is learly the main

drawbak of the adaptive intervals representation. The Fayyad & Irani method

is in average 2.62 times faster than it.

7 Conlusions and further work

This paper foused on an adaptive rule representation as a a robust method

for nding a good disretization. The main ontribution done is provided by

the used of adaptive disrete intervals, whih an split or merge through the

evolution proess, reduing the searh spae where it is possible.

The use of a heuristi disretization method (like the Fayyad & Irani one)

outperform the adaptive intervals representation in some test problem. Never-

Table 3. Mean and deviation of the a

uray (perentage of orretly lassier ex-

amples), number of rules, intervals per attribute and exeution time for eah method

tested. Bold entries show the method with best results for eah test problem. A mark

a signiant out-performane based on a t-test

Problem Conguration A

uray Number of Rules Intervals per Rule Time t-test

tao

Uniform 93.71.2 8.81.6 8.30.0 36.03.5

Fayyad 87.81.1 3.10.3 3.40.1 24.21.4

Adaptive 94.61.3 22.55.6 7.70.4 96.614.7

Multi-Adaptive 94.31.0 19.54.9 6.00.6 94.513.9

pima

Uniform 73.84.1 6.32.2 3.70.0 23.22.8

Fayyad 73.63.1 6.62.6 2.30.2 26.43.0

Adaptive 74.83.5 6.22.6 2.00.4 56.29.4

Multi-Adaptive 74.43.1 5.82.2 1.90.4 59.78.9

iris

Uniform 92.92.7 3.81.1 8.20.0 5.20.7

Fayyad 94.23.0 3.20.6 2.80.1 5.50.1

Adaptive 94.92.3 3.30.5 1.30.2 9.20.4

Multi-Adaptive 96.22.2 3.60.9 1.30.2 9.00.8

glass

Uniform 60.58.9 8.71.8 3.70.0 13.91.5

Fayyad 65.76.1 8.11.4 2.40.1 14.01.1

Adaptive 64.64.7 5.91.7 1.70.2 35.15.2

Multi-Adaptive 65.24.1 6.72.0 1.80.2 38.45.0

breast

Uniform 94.82.6 4.82.5 4.60.0 6.51.0

Fayyad 95.21.8 4.10.8 3.60.1 5.80.4

Adaptive 95.42.3 2.71.0 1.80.2 15.72.1

Multi-Adaptive 95.32.3 2.60.9 1.70.2 17.41.5

bps

Uniform 77.63.3 15.07.0 3.90.0 50.89.0

Fayyad 80.03.1 7.13.8 2.40.1 37.76.0

Adaptive 80.33.5 4.73.0 2.10.4 106.621.1

Multi-Adaptive 80.13.3 5.12.0 2.00.3 115.920.5

mamm

Uniform 63.29.9 2.60.5 2.00.0 7.81.0

Fayyad 65.311.1 2.30.5 2.00.1 8.50.7

Adaptive 65.85.3 4.41.7 1.80.2 27.64.9

Multi-Adaptive 65.06.1 4.41.9 1.90.2 27.45.5

lrn

Uniform 64.74.9 17.85.1 4.90.0 29.24.0

Fayyad 67.55.1 14.35.0 4.40.1 26.53.4

Adaptive 66.14.6 14.04.6 3.60.3 58.97.9

Multi-Adaptive 66.74.1 11.64.1 3.40.2 53.97.2

Table 4. Performane ranking of the tested methods. Lower number means better

ranking.

Problem Fixed Fayyad Adaptive Multi-Adaptive

tao 3 4 1 2

pima 3 4 1 2

iris 4 3 2 1

glass 4 1 3 2

breast 4 3 1 2

bps 4 3 1 2

mam 4 2 1 3

lrn 4 1 3 2

Average 3.25 2.625 1.625 2

Final rank 4 3 1 2

theless, the performane inrease is not signiant. On the other hand, when

the adaptive intervals outperform the other methods, the performane inrease

is higher, showing a better degree of robustness.

The overhead of evolving disretization intervals and rules at the same time is

quite signiant, being its main drawbak. Beside the ost of the representation

itself (our implementation uses twie the memory of the disrete representation

for the same number of intervals) the main dierene is the signiant redution

of the searh spae ahieved by a heuristi disretization.

Some further work should use the knowledge provided by the disretization

tehniques in order to redue the omputational ost of the adaptive intervals

representation. This proess should be ahieved without losing robustness. An-

other important point of further study is how the value of p

split

and p

merge

aet the behavior of the system, in order to simplify the tuning needed for eah

domain.

Finally, it would also be interesting to ompare the adaptive intervals rule

representation with some representation dealing diretly with real-valued at-

tributes, like the ones desribed in the related work setion. This omparison

should follow the same riteria used here: omparing both the a

uray and the

omputational ost.

Aknowledgments

The authors aknowledge the support provided under grant numbers 2001FI

00514, CICYT/Tel08-0408-02 and FIS00/0033-02. The results of this work were

partially obtained using equipment ofunded by the Dire

io General de Reera

de la Generalitat de Catalunya (D.O.G.C. 30/12/1997). Finally we would like

to thank Enginyeria i Arquitetura La Salle for their support to our researh

group.

Referenes

1. Jaume Baardit and Josep M. Garrell. Evolution of adaptive disretization inter-

vals for a rule-based geneti learning system. In Proeedings of the Geneti and

Evolutionary Computation Conferene (GECCO-2002) (to appear), 2002.

2. Jaume Baardit and Josep M. Garrell. Metodos de generalizaion para sistemas

lasiadores de Pittsburgh. In Proeedings of the \Primer Congreso Iberoameri-

ano de Algoritmos Evolutivos y Bioinspirados (AEB'02)", pages 486{493, 2002.

3. C. Blake, E. Keogh, and C. Merz. Ui repository of mahine learning databases,

1998. Blake, C., Keogh, E., & Merz, C.J. (1998). UCI repository of mahine

learning databases (www.is.ui.edu/mlearn/MLRepository.html).

4. A. L. Cororan and S. Sen. Using real-valued geneti algorithms to evolve rule

sets for lassiation. In Proeedings of the IEEE Conferene on Evolutionary

Computation, pages 120{124, 1994.

5. O. Cordon, M. del Jesus, and F. Herrera. Geneti learning of fuzzy rule-based

lassiation systems o-operating with fuzzy reasoning methods. In International

Journal of Intelligent Systems, Vol. 13 (10/11), pages 1025{1053, 1998.

6. Kenneth A. DeJong and William M. Spears. Learning onept lassiation rules

using geneti algorithms. Proeedings of the International Joint Conferene on

Artiial Intelligene, pages 651{656, 1991.

7. Usama M. Fayyad and Keki B. Irani. Multi-interval disretization of ontinuous-

valued attributes for lassiation learning. In IJCAI, pages 1022{1029, 1993.

8. David E. Goldberg. Geneti Algorithms in Searh, Optimization and Mahine

Learning. Addison-Wesley Publishing Company, In., 1989.

9. Elisabet Golobardes, Xavier Llora, Josep Maria Garrell, David Vernet, and Jaume

Baardit. Geneti lassier system as a heuristi weighting method for a ase-

based lassier system. Butllet de l'Assoiaio Catalana d'Intel.ligenia Artiial,

22:132{141, 2000.

10. John H. Holland. Adaptation in Natural and Artiial Systems. University of

Mihigan Press, 1975.

11. John H. Holland. Esaping Brittleness: The possibilities of General-Purpose Learn-

ing Algorithms Applied to Parallel Rule-Based Systems. In Mahine learning, an

artiial intelligene approah. Volume II, pages 593{623. 1986.

12. Ron Kohavi. A study of ross-validation and bootstrap for a

uray estimation

and model seletion. In IJCAI, pages 1137{1145, 1995.

13. Alexander V. Kozlov and Daphne Koller. Nonuniform dynami disretization in

hybrid networks. In Proeedings of the 13th Annual Conferene on Unertainty in

AI (UAI), pages 314{325, 1997.

14. H. Liu and R. Setiono. Chi2: Feature seletion and disretization of numeri at-

tributes. In Proeedings of 7th IEEE International Conferene on Tools with Arti-

ial Intelligene, pages 388{391. IEEE Computer Soiety, 1995.

15. Xavier Llora and Josep M. Garrell. Knowledge-independent data mining with

ne-grained parallel evolutionary algorithms. In Proeedings of the Geneti and

Evolutionary Computation Conferene (GECCO-2001), pages 461{468. Morgan

Kaufmann, 2001.

16. J. Mart, X. Cuf, J. Reginos, and et al. Shape-based feature seletion for miroal-

iation evaluation. In Imaging Conferene on Image Proessing, 3338:1215-1224,

1998.

17. E. Martnez Marroqun, C. Vos, and et al. Morphologial analysis of mammary

biopsy images. In Proeedings of the IEEE International Conferene on Image

Proessing (ICIP'96), pages 943{947, 1996.

18. Jose C. Riquelme and Jesus S. Aguilar. Codiaion indexada de atributos on-

tinuos para algoritmos evolutivos en aprendizaje supervisado. In Proeedings of

the \Primer Congreso Iberoameriano de Algoritmos Evolutivos y Bioinspirados

(AEB'02)", pages 161{167, 2002.

19. Ronald L. Rivest. Learning deision lists. Mahine Learning, 2(3):229{246, 1987.

20. Stephen F. Smith. Flexible learning of problem solving heuristis through adap-

tive searh. In Proeedings of the 8th International Joint Conferene on Artiial

Intelligene (IJCAI-83), pages 421{425, Los Altos, CA, 1983. Morgan Kaufmann.

21. Terene Soule and James A. Foster. Eets of ode growth and parsimony pressure

on populations in geneti programming. Evolutionary Computation, 6(4):293{309,

Winter 1998.

22. Stewart W. Wilson. Get real! XCS with ontinuous-valued inputs. In L. Booker,

Stephanie Forrest, M. Mithell, and Rik L. Riolo, editors, Festshrift in Honor of

John H. Holland, pages 111{121. Center for the Study of Complex Systems, 1999.

23. Ian H. Witten and Eibe Frank. Data Mining: pratial mahine learning tools and

tehniques with java implementations. Morgan Kaufmann, 2000.

Discretization

Documents

adaptive disrete intervals

small intervals

simple uniformwidth

learning proess

disrete attributes

evolution proess

disretization proess

arule representation