Top Banner

of 10

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
  • Evolution of multi-adaptive disretization

    intervals for a rule-based geneti learning system

    Jaume Baardit and Josep Maria Garrell

    Intelligent Systems Researh Group

    Enginyeria i Arquitetura La Salle,

    Universitat Ramon Llull,

    Psg. Bonanova 8, 08022-Barelona,

    Catalonia, Spain, Europe. fjbaardit,josepmggsalleURL.edu

    Abstrat. Geneti Based Mahine Learning (GBML) systems tradi-

    tionally have evolved rules that only deal with disrete attributes. There-

    fore, some disretization proess is needed in order to teal with real-

    valued attributes. There are several methods to disretize real-valued

    attributes into a nite number of intervals, however none of them an

    eiently solve all the possible problems. The alternative of a high num-

    ber of simple uniform-width intervals usually expands the size of the

    searh spae without a lear performane gain. This paper proposes a

    rule representation whih uses adaptive disrete intervals that split or

    merge through the evolution proess, nding the orret disretization

    intervals at the same time as the learning proess is done.

    1 Introdution

    The appliation of Geneti Algorithms (GA) [10, 8 to lassiation problems is

    usually known as Geneti Based Mahine Learning (GBML), and traditionally it

    has been addressed from two dierent points of view: the Pittsburgh approah,

    and the Mihigan approah, early exemplied by LS-1 [20 and CS-1 [11, re-

    spetively.

    The lassial knowledge representation used in these systems is a set of rules

    where the anteedent is dened by a prexed nite number of intervals to handle

    real-valued attributes. The performane of these systems is tied to the right

    eletion of the intervals.

    In this paper we use a rule representation with adaptive disrete intervals.

    These intervals are splitted and merged through the evolution proess that drives

    the training stage. This approah avoids the higher omputational ost of the

    approahes whih work diretly with real values and nds a good disretiza-

    tion only expanding the searh spae with small intervals when neessary. This

    representation was introdued in [1 and the work presented in this paper is

    its evolution, mainly foused on generalizing the approah and simplifying the

    tuning needed for eah domain.

    This rule representation is ompared aross dierent domains against the

    traditional disrete representation with xed intervals. The number and size of

  • the xed intervals approah is obtained with two methods: (1) simple uniform-

    width intervals and (2) intervals obtained with the Fayyad & Irani method [7,

    a well-known disretization algorithm. The aim of this omparison is two-fold:

    measure the a

    uray performane and the omputational ost.

    The paper is strutured as follows. Setion 2 presents some related work.

    Then, we desribe the framework of our lassier system setion 3. The adaptive

    intervals rule representation is explained in setion 4. Next, setion 5 desribes

    the test suite used in the omparison. The results obtained are summarized in

    setion 6. Finally, setion 7 disusses the onlusions and some further work.

    2 Related work

    There are several approahes to handle real-valued attributes in the Geneti

    Based Mahine Learning (GBML) eld. Early approahes use disrete rules with

    a large number of prexed uniform disretization intervals. However, this ap-

    proah has the problem that the searh spae grows exponentially, slowing the

    evolutionary proess without a lean a

    uray improvement of the solution [2

    Lately, several alternatives to the disrete rules have been presented. There

    are rules omposed by real-valued intervals (XCSR [22, [4, COGITO [18).

    MOGUL [5, uses a fuzzy reasoning method. This method generates sequentially:

    (1) fuzzy rules, and then (2) fuzzy membership funtions. Reently, GALE [15

    proposed a knowledge independent method for learning other knowledge rep-

    resentations like instane sets or deision trees. All those alternatives present

    better performane, but usually they also have higher omputational ost [18.

    A third approah is to use a heuristi disretization algorithm. Some of

    these methods work with information entropy [7, the

    2

    statisti [14 or multi-

    dimensional non-uniform disretization [13. These algorithms are usually more

    a

    urate and faster than the uniform disretization. However, they suer a lak

    of robustness aross some domains [1.

    3 Framework

    In this setion we desribe the main features of our lassier system. GAssist

    (Geneti Algorithms based laSSIer sySTem) [9 is a Pittsburgh style lassi-

    er system based on GABIL [6. Diretly from GABIL we have borrowed the

    representation of the disrete rules (rules with onjuntive normal form (CNF)

    prediates), the semantially orret rossover operator and the tness ompu-

    tation (squared a

    uray).

    Mathing strategy: The mathing proess follows a \if ... then ... else if ... then..."

    struture, usually alled Deision List [19.

    Mutation operators: The system manipulates variable-length individuals, mak-

    ing more diult the tuning of the lassi gene-based mutation probability. In

    order to simplify this tuning, we dene p

    mut

    as the probability i of mutating

    an individual. When an individual is seleted for mutation (based on p

    mut

    ), a

    random gene is hosen inside its hromosome for mutation.

  • Control of the individuals length: Dealing with variable-length individuals arises

    some serious onsiderations. One of the most important ones is the ontrol of the

    size of the evolving individuals [21. This ontrol is ahieved ins GAssist using

    two dierent operators:

    { Rule deletion This operator deletes the rules of the individuals that do not

    math any training example. This rule deletion is done after the tness om-

    putation and has two onstraints: (a) the proess is only ativated after a

    predened number of iterations, to prevent a massive diversity loss and (b)

    the number of rules of an individual never goes below a lower threshold. This

    threshold is assigned to the number of lasses of the domain.

    { Seletion bias using the individual size Seletion is guided as usual by the

    tness (the a

    uray). However, it also gives ertain degree of relevane to

    the size of the individuals, having a poliy similar to multi-objetive systems.

    We use tournament seletion beause its loal behavior lets us implement

    this poliy. The riterion of the tournament is given by an operator alled

    \size-based omparison" [2. This operator onsiders two individuals similar

    if their tness dierene is below a ertain threshold (d

    omp

    ). Then, it selets

    the individual with fewer number of rules.

    4 Disrete rules with adaptive intervals

    This setion desribes the rule representation based on disrete rules with adap-

    tive intervals. First we desribe the problems that traditional disrete rules

    present. Then, we explain the adaptive intervals rules proposed and the hanges

    introdued in order to enable the GA to use them.

    4.1 Disrete rules and unneessary searh spae growth

    The traditional approah to solve problems with real-valued attributes using

    disrete rules has been done using a disretization proess. This disretization

    an be done using algorithms whih determine the disretization intervals ana-

    lyzing the training information or we an use a simple alternative like using an

    uniform-width intervals disretization.

    In the latter method, the way to inrease the a

    uray of the solution is to

    inrease the number of intervals. This solution brings a big problem beause the

    searh spae to explore grows in an exponential degree when more intervals are

    added. The improvement in a

    uray expeted inreasing the number of intervals

    does not exist sometimes, beause the GA spends too muh time exploring areas

    of the searh spae whih do not need to be explored.

    If we nd a orret and minimal set of intervals, the solution a

    uray will

    probably inrease without a huge inrease of the omputational ost.

    4.2 Finding good and minimal intervals

    Our aim is to nd good disretization intervals without a great expansion of the

    searh spae. In order to ahieve this goal we dened a rule representation [1

  • with disrete adaptive intervals where the disretization intervals are not xed.

    These intervals are evolved through the iterations, merging and splitting between

    them.

    To ontrol the omputational ost and the growth of the searh spae, we

    dene the next onstraints:

    { A number of \low level" uniform and stati intervals is dened for eah

    attribute alled miro-intervals.

    { The adaptive intervals are built joining together miro-intervals.

    { When we split an interval, we selet a random point in its miro-intervals

    to break it.

    { When we merge two intervals, the value of the resulting interval is taken

    from the one whih has more miro-intervals. If both have the same number

    of miro-intervals, the value is hosen randomly.

    { The number and size of the initial intervals is seleted randomly.

    The adaptive intervals as well as the split and merge operators are shown in

    gure 1.

    Fig. 1. Adaptive intervals representation and the split and merge operators.

    Rule set

    ClassRule

    11 1 0Interval valueAttribute

    Microinterval { Interval

    Attribute

    MergeSplit

    11 1 0

    Interval to mutate

    1 1 0 0 1

    Cut point Neighbour selected to merge

    1 1 0

    To apply the split and merge operators we have added to the GA yle

    two speial phases applied to the ospring population after the mutation phase.

    For eah phase (split and merge) we have a probability (p

    split

    or p

    merge

    ) of

    applying a split or merge operation to an individual. If an individual is seleted

    for splitting or merging, a random point inside its hromosome is hosen to apply

    the operation.

    Finally, this representation requires some hanges in some other parts of the

    GA:

    { The rossover operator an only take plae in the attribute boundaries.

    { The \size-based omparison" operator uses the length (number of genes) of

    the individual instead of the number of rules, beause now the size of a rule

    an hange when the number of intervals that it ontains hange. This hange

    also makes the GA prefer the individuals with fewer intervals in addition to

    fewer rules, further simplifying them.

  • 4.3 Changes to the adaptive intervals rule representation

    One of the main drawbaks of the initial approah was the sizing of the number

    of miro-intervals assigned to eah attribute term of the rules. This parameter

    is diult to tune beause it is domain-spei.

    In this paper we test another approah (multi-adaptive) whih onsists in

    evolving attribute terms with dierent number of miro-intervals in the same

    population. This enables the evolutionary proess to selet the orret number

    of miro-intervals for eah attribute term of the rules. The number of miro-

    intervals of eah attribute term is seleted from a predened set in the initial-

    ization stage.

    The initialization phase has also hanged. In our previous work the number

    and size of the intervals was uniform. We have hanged this poliy to a total

    random initialization in order to gain diversity in the initial population.

    The last hange introdued involves the split and merge operators. In the

    previous version these operators were integrated inside the mutation. This made

    the sizing of the probabilities very diult beause the three operators (split,

    merge and mutation) were oupled. Using an extra reombination stage in this

    version we eliminate this tight linkage.

    5 Test suite

    This setion summarizes the tests done in order to evaluate the a

    uray and

    eieny of the method presented in this paper. We also ompare it with some

    alternative methods. The tests were onduted using several mahine learning

    problems whih we also desribe.

    5.1 Test problems

    The seleted test problems for this paper present dierent harateristis in order

    to give us a broad overview of the performane of the methods being ompared.

    The rst problem is a syntheti problem (Tao [15) that has non-orthogonal

    lass boundaries. We also use several problems provided by the University of Cal-

    ifornia at Irvine (UCI) repository [3. The problems seleted are: Pima-indians-

    diabetes (pima), iris, glass and breast-aner-winsonsin (breast). Finally we will

    use three problems from our own private repository. The rst two deal with the

    diagnosis of breast aner based of biopsies (bps [17) and mammograms (mamm

    [16) whereas the last one is related to the predition of student qualiations

    (lrn [9). The harateristis of the problems are listed in table 1. The partition

    of the examples into the train and test sets was done using the stratied ten-fold

    ross-validation method [12.

    5.2 Congurations of the GA to test

    The main goal of the tests are to evaluate the performane of the adaptive inter-

    vals rules representation. In order to ompare this method with the traditional

  • Table 1. Charateristis of the test problems.

    Dataset Number of examples real attributes disrete attributes lasses

    tao 1888 2 - 2

    pima 768 8 - 2

    iris 150 4 - 3

    glass 214 9 - 6

    breast 699 - 9 2

    bps 1027 24 - 2

    mamm 216 21 - 2

    lrn 648 4 2 5

    disrete representation, we use two disretization methods, the simple uniform-

    width intervals method and the Fayyad & Irani method [7.

    We analyze the adaptive intervals approah with two types of runs. The rst

    one assigns the same number of miro-intervals to all the attribute terms of

    the individuals. We all this type of run adaptive. In the seond one, attributes

    with dierent number of miro-intervals oexist in the same population. We well

    all this type multi-adaptive.

    The GA parameters are shown in table 2. The reader an appreiate that the

    sizing of both p

    split

    and p

    merge

    is the same for all the problems exept the tao

    problem. Giving the same value to p

    merge

    and p

    split

    produe solutions with too

    few rules and intervals, as well as less a

    urate than the results obtained with

    the onguration shown in table 2. This is an issue that needs further study.

    Another important issue of the p

    split

    and p

    merge

    probabilities for some of

    the domains is that they are greater than 1. This means that for these domains

    at least one split and merge operation will be surely done to eah individual

    of the population. Thus, p

    split

    and p

    merge

    beome expeted values instead of

    probabilities. The tuning done produes a redution of the number of iterations

    needed.

    6 Results

    In this setion present the results obtained. The aim of the tests was to ompare

    the method presented in the paper in three aspets: a

    uray and size of the

    solutions as well as the omputational ost. Foreah method and test problem

    we show the average and standard deviation values of: (1) the ross-validation

    a

    uray, (2) the size of the best individual in number of rules and intervals per

    attribute and (3) the exeution time in seonds. The tests were exeuted in an

    AMD Athlon 1700+ using Linux operating system and C++ language.

    The results were also analyzed using the two-sided t-test [23 to determine if

    the two adaptive methods outperform the other ones with a signiane level of

    1%. Finally, for eah onguration, test and fold, 15 runs using dierent random

    seeds were done. Results are shown in table 3. The olumn titled t-test show a

    beside the Uniform or Fayyad & Irani method if it was outperformed by the

  • Table 2. Common and problem-spei parameters of the GA.

    Parameter Value

    Crossover probability 0.6

    Iter. of rule eliminating ativation 30

    Iter. of size omparison ativation 30

    Sets of miro-intervals in the multi-adaptive test 5,6,7,8,10,15,20,25

    Tournament size 3

    Population size 300

    Probability of mutating an individual 0.6

    Code Parameter

    #iter Number of GA iterations

    d

    interv

    Number of intervals in the uniform-width disrete rules

    a

    interv

    Number of miro-intervals in the adaptive test

    d

    omp

    Distane parameter in the \size-based omparison" operator

    p

    split

    Probability of splitting an individual (one of its intervals)

    p

    merge

    Probability of merging an individual (one of its intervals)

    Problem Parameter

    #iter d

    interv

    a

    interv

    d

    omp

    p

    merge

    p

    split

    tao 600 12 48 0.001 1.3 2.6

    pima 500 4 8 0.01 0.8 0.8

    iris 400 10 10 0.02 0.5 0.5

    glass 750 4 8 0.015 1.5 1.5

    breast 325 5 10 0.01 3.2 3.2

    bps 500 4 10 0.015 1.7 1.7

    mamm 500 2 5 0.01 1 1

    lrn 700 5 10 0.01 1.2 1.2

    adaptive methods. The adaptive methods were never outperformed in the tests

    done, showing a good robustness.

    The results are summarized using the ranking in table 4. The ranking for

    eah problem and method is based on the a

    uray. The global i rankings are

    omputed averaging the problem rankings.

    Table 3 shows that in two of the tests the best performing method was the

    Fayyad & Irani interval disretization tehnique. However, in the rest of the tests

    its performane is lower, showing a lak of robustness aross dierent domains.

    The two adaptive tests ahieved the best results of the ranking. Nevertheless, the

    goal of improving the rule representation with the multi-adaptive onguration

    has not been ahieved. It is only better than the original adaptive onguration

    in three of the eight test problems. The omputational ost is learly the main

    drawbak of the adaptive intervals representation. The Fayyad & Irani method

    is in average 2.62 times faster than it.

    7 Conlusions and further work

    This paper foused on an adaptive rule representation as a a robust method

    for nding a good disretization. The main ontribution done is provided by

    the used of adaptive disrete intervals, whih an split or merge through the

    evolution proess, reduing the searh spae where it is possible.

    The use of a heuristi disretization method (like the Fayyad & Irani one)

    outperform the adaptive intervals representation in some test problem. Never-

  • Table 3. Mean and deviation of the a

    uray (perentage of orretly lassier ex-

    amples), number of rules, intervals per attribute and exeution time for eah method

    tested. Bold entries show the method with best results for eah test problem. A mark

    a signiant out-performane based on a t-test

    Problem Conguration A

    uray Number of Rules Intervals per Rule Time t-test

    tao

    Uniform 93.71.2 8.81.6 8.30.0 36.03.5

    Fayyad 87.81.1 3.10.3 3.40.1 24.21.4

    Adaptive 94.61.3 22.55.6 7.70.4 96.614.7

    Multi-Adaptive 94.31.0 19.54.9 6.00.6 94.513.9

    pima

    Uniform 73.84.1 6.32.2 3.70.0 23.22.8

    Fayyad 73.63.1 6.62.6 2.30.2 26.43.0

    Adaptive 74.83.5 6.22.6 2.00.4 56.29.4

    Multi-Adaptive 74.43.1 5.82.2 1.90.4 59.78.9

    iris

    Uniform 92.92.7 3.81.1 8.20.0 5.20.7

    Fayyad 94.23.0 3.20.6 2.80.1 5.50.1

    Adaptive 94.92.3 3.30.5 1.30.2 9.20.4

    Multi-Adaptive 96.22.2 3.60.9 1.30.2 9.00.8

    glass

    Uniform 60.58.9 8.71.8 3.70.0 13.91.5

    Fayyad 65.76.1 8.11.4 2.40.1 14.01.1

    Adaptive 64.64.7 5.91.7 1.70.2 35.15.2

    Multi-Adaptive 65.24.1 6.72.0 1.80.2 38.45.0

    breast

    Uniform 94.82.6 4.82.5 4.60.0 6.51.0

    Fayyad 95.21.8 4.10.8 3.60.1 5.80.4

    Adaptive 95.42.3 2.71.0 1.80.2 15.72.1

    Multi-Adaptive 95.32.3 2.60.9 1.70.2 17.41.5

    bps

    Uniform 77.63.3 15.07.0 3.90.0 50.89.0

    Fayyad 80.03.1 7.13.8 2.40.1 37.76.0

    Adaptive 80.33.5 4.73.0 2.10.4 106.621.1

    Multi-Adaptive 80.13.3 5.12.0 2.00.3 115.920.5

    mamm

    Uniform 63.29.9 2.60.5 2.00.0 7.81.0

    Fayyad 65.311.1 2.30.5 2.00.1 8.50.7

    Adaptive 65.85.3 4.41.7 1.80.2 27.64.9

    Multi-Adaptive 65.06.1 4.41.9 1.90.2 27.45.5

    lrn

    Uniform 64.74.9 17.85.1 4.90.0 29.24.0

    Fayyad 67.55.1 14.35.0 4.40.1 26.53.4

    Adaptive 66.14.6 14.04.6 3.60.3 58.97.9

    Multi-Adaptive 66.74.1 11.64.1 3.40.2 53.97.2

    Table 4. Performane ranking of the tested methods. Lower number means better

    ranking.

    Problem Fixed Fayyad Adaptive Multi-Adaptive

    tao 3 4 1 2

    pima 3 4 1 2

    iris 4 3 2 1

    glass 4 1 3 2

    breast 4 3 1 2

    bps 4 3 1 2

    mam 4 2 1 3

    lrn 4 1 3 2

    Average 3.25 2.625 1.625 2

    Final rank 4 3 1 2

  • theless, the performane inrease is not signiant. On the other hand, when

    the adaptive intervals outperform the other methods, the performane inrease

    is higher, showing a better degree of robustness.

    The overhead of evolving disretization intervals and rules at the same time is

    quite signiant, being its main drawbak. Beside the ost of the representation

    itself (our implementation uses twie the memory of the disrete representation

    for the same number of intervals) the main dierene is the signiant redution

    of the searh spae ahieved by a heuristi disretization.

    Some further work should use the knowledge provided by the disretization

    tehniques in order to redue the omputational ost of the adaptive intervals

    representation. This proess should be ahieved without losing robustness. An-

    other important point of further study is how the value of p

    split

    and p

    merge

    aet the behavior of the system, in order to simplify the tuning needed for eah

    domain.

    Finally, it would also be interesting to ompare the adaptive intervals rule

    representation with some representation dealing diretly with real-valued at-

    tributes, like the ones desribed in the related work setion. This omparison

    should follow the same riteria used here: omparing both the a

    uray and the

    omputational ost.

    Aknowledgments

    The authors aknowledge the support provided under grant numbers 2001FI

    00514, CICYT/Tel08-0408-02 and FIS00/0033-02. The results of this work were

    partially obtained using equipment ofunded by the Dire

    io General de Reera

    de la Generalitat de Catalunya (D.O.G.C. 30/12/1997). Finally we would like

    to thank Enginyeria i Arquitetura La Salle for their support to our researh

    group.

    Referenes

    1. Jaume Baardit and Josep M. Garrell. Evolution of adaptive disretization inter-

    vals for a rule-based geneti learning system. In Proeedings of the Geneti and

    Evolutionary Computation Conferene (GECCO-2002) (to appear), 2002.

    2. Jaume Baardit and Josep M. Garrell. Metodos de generalizaion para sistemas

    lasiadores de Pittsburgh. In Proeedings of the \Primer Congreso Iberoameri-

    ano de Algoritmos Evolutivos y Bioinspirados (AEB'02)", pages 486{493, 2002.

    3. C. Blake, E. Keogh, and C. Merz. Ui repository of mahine learning databases,

    1998. Blake, C., Keogh, E., & Merz, C.J. (1998). UCI repository of mahine

    learning databases (www.is.ui.edu/mlearn/MLRepository.html).

    4. A. L. Cororan and S. Sen. Using real-valued geneti algorithms to evolve rule

    sets for lassiation. In Proeedings of the IEEE Conferene on Evolutionary

    Computation, pages 120{124, 1994.

    5. O. Cordon, M. del Jesus, and F. Herrera. Geneti learning of fuzzy rule-based

    lassiation systems o-operating with fuzzy reasoning methods. In International

    Journal of Intelligent Systems, Vol. 13 (10/11), pages 1025{1053, 1998.

  • 6. Kenneth A. DeJong and William M. Spears. Learning onept lassiation rules

    using geneti algorithms. Proeedings of the International Joint Conferene on

    Artiial Intelligene, pages 651{656, 1991.

    7. Usama M. Fayyad and Keki B. Irani. Multi-interval disretization of ontinuous-

    valued attributes for lassiation learning. In IJCAI, pages 1022{1029, 1993.

    8. David E. Goldberg. Geneti Algorithms in Searh, Optimization and Mahine

    Learning. Addison-Wesley Publishing Company, In., 1989.

    9. Elisabet Golobardes, Xavier Llora, Josep Maria Garrell, David Vernet, and Jaume

    Baardit. Geneti lassier system as a heuristi weighting method for a ase-

    based lassier system. Butllet de l'Assoiaio Catalana d'Intel.ligenia Artiial,

    22:132{141, 2000.

    10. John H. Holland. Adaptation in Natural and Artiial Systems. University of

    Mihigan Press, 1975.

    11. John H. Holland. Esaping Brittleness: The possibilities of General-Purpose Learn-

    ing Algorithms Applied to Parallel Rule-Based Systems. In Mahine learning, an

    artiial intelligene approah. Volume II, pages 593{623. 1986.

    12. Ron Kohavi. A study of ross-validation and bootstrap for a

    uray estimation

    and model seletion. In IJCAI, pages 1137{1145, 1995.

    13. Alexander V. Kozlov and Daphne Koller. Nonuniform dynami disretization in

    hybrid networks. In Proeedings of the 13th Annual Conferene on Unertainty in

    AI (UAI), pages 314{325, 1997.

    14. H. Liu and R. Setiono. Chi2: Feature seletion and disretization of numeri at-

    tributes. In Proeedings of 7th IEEE International Conferene on Tools with Arti-

    ial Intelligene, pages 388{391. IEEE Computer Soiety, 1995.

    15. Xavier Llora and Josep M. Garrell. Knowledge-independent data mining with

    ne-grained parallel evolutionary algorithms. In Proeedings of the Geneti and

    Evolutionary Computation Conferene (GECCO-2001), pages 461{468. Morgan

    Kaufmann, 2001.

    16. J. Mart, X. Cuf, J. Reginos, and et al. Shape-based feature seletion for miroal-

    iation evaluation. In Imaging Conferene on Image Proessing, 3338:1215-1224,

    1998.

    17. E. Martnez Marroqun, C. Vos, and et al. Morphologial analysis of mammary

    biopsy images. In Proeedings of the IEEE International Conferene on Image

    Proessing (ICIP'96), pages 943{947, 1996.

    18. Jose C. Riquelme and Jesus S. Aguilar. Codiaion indexada de atributos on-

    tinuos para algoritmos evolutivos en aprendizaje supervisado. In Proeedings of

    the \Primer Congreso Iberoameriano de Algoritmos Evolutivos y Bioinspirados

    (AEB'02)", pages 161{167, 2002.

    19. Ronald L. Rivest. Learning deision lists. Mahine Learning, 2(3):229{246, 1987.

    20. Stephen F. Smith. Flexible learning of problem solving heuristis through adap-

    tive searh. In Proeedings of the 8th International Joint Conferene on Artiial

    Intelligene (IJCAI-83), pages 421{425, Los Altos, CA, 1983. Morgan Kaufmann.

    21. Terene Soule and James A. Foster. Eets of ode growth and parsimony pressure

    on populations in geneti programming. Evolutionary Computation, 6(4):293{309,

    Winter 1998.

    22. Stewart W. Wilson. Get real! XCS with ontinuous-valued inputs. In L. Booker,

    Stephanie Forrest, M. Mithell, and Rik L. Riolo, editors, Festshrift in Honor of

    John H. Holland, pages 111{121. Center for the Study of Complex Systems, 1999.

    23. Ian H. Witten and Eibe Frank. Data Mining: pratial mahine learning tools and

    tehniques with java implementations. Morgan Kaufmann, 2000.