Project 3: Automatic Programming, Weka & Rgoffeng.net/cato/skole/master/ML/project-3-automatic.pdf · 2015. 4. 10. · Chapter 1 Introductiontoartiﬁcial evolution&automatic programming

Project 3: Automatic Programming, Weka & R

Arnel Curkic, Amund Lågbu and Cato A. Goffeng

April 11, 2014

Abstract

In this project we have tested different machine learning techniques. We have,with help from our lecturer, tested how well ADATE could classify the instancesin a poker data set. In WEKA we have tested the SMO, naiveBayesian, kStarand random forest algorithms. In R we have tried neural networks and naivebayesian, random forest and svm algorithms.

It turned out that ADATE performed best when classifying the poker in-stances, with an error percentage of only 0.5%.

Finally the project also contains descriptions of algorithms, machine learningtechniques and our own proposal to how a genetic algorithm could classify pokerinstances, using techniques from natural evolution.

Contents

I The automatic programming option 7

1 Introduction to artificial evolution & automatic programming 81.1 Artificial evolution . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.1.1 Genetic algorithms . . . . . . . . . . . . . . . . . . . . . . 81.1.2 Evolution strategies . . . . . . . . . . . . . . . . . . . . . 101.1.3 Genetic programming . . . . . . . . . . . . . . . . . . . . 10

1.2 Automatic programming . . . . . . . . . . . . . . . . . . . . . . . 10

2 Evolution in nature and in computer science 112.1 Evolutionary mechanisms . . . . . . . . . . . . . . . . . . . . . . 11

2.1.1 Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.1.2 Mutations . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.1.3 Genetic drift . . . . . . . . . . . . . . . . . . . . . . . . . 122.1.4 Evolution within genes . . . . . . . . . . . . . . . . . . . . 122.1.5 Mass extinctions . . . . . . . . . . . . . . . . . . . . . . . 13

2.2 What others have done . . . . . . . . . . . . . . . . . . . . . . . . 132.2.1 Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2.2 Mutations . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.2.3 Genetic drift . . . . . . . . . . . . . . . . . . . . . . . . . 142.2.4 Neutral evolution . . . . . . . . . . . . . . . . . . . . . . . 142.2.5 Genetic linkage . . . . . . . . . . . . . . . . . . . . . . . . 142.2.6 Mass extinctions . . . . . . . . . . . . . . . . . . . . . . . 14

2.3 How evolution could have been implemented in a program clas-sifying poker instances . . . . . . . . . . . . . . . . . . . . . . . . 15

3 Data set selection 17

4 Specification file for ADATE, how the data is coded and de-scription of user callbacks 184.1 Creation of a specification file . . . . . . . . . . . . . . . . . . . . 184.2 How the data is coded . . . . . . . . . . . . . . . . . . . . . . . . 184.3 Relevant user callbacks . . . . . . . . . . . . . . . . . . . . . . . . 18

5 Log, trace and validation, as well as an analysis of possibleoverfitting 215.1 The log file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225.2 The trace file . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235.3 The validation file . . . . . . . . . . . . . . . . . . . . . . . . . . 24

1

5.4 Analysis of possible overfitting . . . . . . . . . . . . . . . . . . . 24

6 The program with the best validation value 266.1 Interpretation of the program . . . . . . . . . . . . . . . . . . . . 26

7 Test of the ADATE program 27

II The WEKA alternative 28

8 Available machine learning algorithms in WEKA 29

9 Algorithm selection and descriptions 309.1 SMO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

9.1.1 Support vector machines generally . . . . . . . . . . . . . 309.1.2 WEKA’s implementation of a SVM . . . . . . . . . . . . 309.1.3 SMO options in WEKA . . . . . . . . . . . . . . . . . . . 319.1.4 Two, simple tests of the SMO algorithm . . . . . . . . . . 31

9.2 NaiveBayes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 329.2.1 Bayesian classifiers in general . . . . . . . . . . . . . . . . 329.2.2 WEKA’s naive bayesian classifier . . . . . . . . . . . . . . 339.2.3 Naive bayesian options in WEKA . . . . . . . . . . . . . . 339.2.4 Three tests of the naive bayesian algorithm . . . . . . . . 33

9.3 KStar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 349.3.1 Instance-Based learners . . . . . . . . . . . . . . . . . . . 349.3.2 KStar options . . . . . . . . . . . . . . . . . . . . . . . . . 349.3.3 Weka test runs . . . . . . . . . . . . . . . . . . . . . . . . 34

10 WEKA tests 3610.1 SMO tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3610.2 KStar tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3710.3 Naive Bayes tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 3810.4 Random Forest tests . . . . . . . . . . . . . . . . . . . . . . . . . 39

10.4.1 Initial test . . . . . . . . . . . . . . . . . . . . . . . . . . . 3910.4.2 Advanced tests . . . . . . . . . . . . . . . . . . . . . . . . 39

11 Test of tools and comparison of results 4111.1 Cubist test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4111.2 C5.0 tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4211.3 Comparison of techniques . . . . . . . . . . . . . . . . . . . . . . 45

12 Two more tasks 4612.1 How sensible the techniques are to missing values . . . . . . . . . 4612.2 Future improvements . . . . . . . . . . . . . . . . . . . . . . . . . 47

III R 48

13 R - Intro 49

2

14 R’s SVM 5014.1 An initial test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5014.2 Advanced tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

14.2.1 Test of cost . . . . . . . . . . . . . . . . . . . . . . . . . . 5114.2.2 Test of kernel type . . . . . . . . . . . . . . . . . . . . . . 5314.2.3 Test of scale . . . . . . . . . . . . . . . . . . . . . . . . . . 5314.2.4 Combined test . . . . . . . . . . . . . . . . . . . . . . . . 54

15 R’s naiveBayes 5715.1 About naiveBayes . . . . . . . . . . . . . . . . . . . . . . . . . . 5715.2 An initial test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5815.3 Advanced tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5815.4 Tests of caret’s naive Bayesian classifier . . . . . . . . . . . . . . 60

16 R’s random forest 6216.1 About the random forest algorithm . . . . . . . . . . . . . . . . . 6216.2 Initial tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6216.3 Advanced tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

17 R’s neural networks 6617.1 Simple tests of the ANNs in both packages . . . . . . . . . . . . 6617.2 Advanced tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

18 Comparison with Weka 7118.1 Naive Bayesian classifiers . . . . . . . . . . . . . . . . . . . . . . 7118.2 Random forest classifiers . . . . . . . . . . . . . . . . . . . . . . . 7118.3 SVM classifiers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

A Automatic programming 78A.1 ADATE file outputs . . . . . . . . . . . . . . . . . . . . . . . . . 78

A.1.1 Output of individual in the .log-file . . . . . . . . . . . . . 78A.1.2 pe1g0s0-grid from the trace-file . . . . . . . . . . . . . . . 80A.1.3 Program with the best validation results . . . . . . . . . . 82

A.2 ADATE screen dumps . . . . . . . . . . . . . . . . . . . . . . . . 84

B WEKA 86B.1 WEKA text output . . . . . . . . . . . . . . . . . . . . . . . . . . 86

B.1.1 The first initial test of the SMO algorithm . . . . . . . . . 86B.1.2 The second initial test of the SMO algorithm . . . . . . . 87B.1.3 The first test of the naiveBayes algorithm . . . . . . . . . 88B.1.4 The second test of naiveBayes . . . . . . . . . . . . . . . . 89B.1.5 The third test of naiveBayes . . . . . . . . . . . . . . . . . 89

B.2 WEKA - code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90B.2.1 Java program removing values . . . . . . . . . . . . . . . 90

B.3 WEKA screen dumps . . . . . . . . . . . . . . . . . . . . . . . . 94

3

C R 97C.1 R scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

C.1.1 First script to test the SVM included in package e1071 . . 97C.1.2 Script to test RSNNS’s mlp . . . . . . . . . . . . . . . . . 98C.1.3 Final test of R’s SVM (e1071) . . . . . . . . . . . . . . . . 98

C.2 Java code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100C.2.1 Code written to change data in csv-files . . . . . . . . . . 100

4

List of Figures

1.1 A basic illustration of a genetic algorithm [41]. . . . . . . . . . . 9

2.1 How we would like to create an algorithm containing several GAsfor classifying poker instances. . . . . . . . . . . . . . . . . . . . . 16

7.1 Test of Adate’s best program with the complete data set contain-ing 1.025.010 instances. . . . . . . . . . . . . . . . . . . . . . . . 27

11.1 A test of the data mining tool Cubist. . . . . . . . . . . . . . . . 4211.2 An example of rules generated by Cubist. . . . . . . . . . . . . . 4211.3 Option selections when running C5.0 in terminal. . . . . . . . . . 43

14.1 Confusion matrix for the first test of R’s SVM. . . . . . . . . . . 5114.2 The first SVM’s predictions. . . . . . . . . . . . . . . . . . . . . . 5114.3 Confusion matrices showing results for testing and training of an

SVM with cost value set to 32. . . . . . . . . . . . . . . . . . . . 5214.4 Confidence interval calculated on the basis of the best scale test. 5414.5 Confidence interval calculated on the basis of the best kernel test. 5514.6 Confidence interval calculated on the basis of the best cost test. . 55

15.1 The initial test of R’s naiveBayes. . . . . . . . . . . . . . . . . . . 58

16.1 Upper and lower limit for the confidence interval calculated withthe results of the second test run. . . . . . . . . . . . . . . . . . . 64

17.1 A simple neural network created with the ANN included in R’sneuralnet-package. . . . . . . . . . . . . . . . . . . . . . . . . . . 67

17.2 A test of the RSNNS’ plotIterativeError-function. . . . . . . . . . 68

A.1 Generation of a .spec-file from a C5.0 .names- and .data-file. . . . 85A.2 ADATE-ML code in the generated .spec-file. . . . . . . . . . . . 85A.3 Creation of a poker.spec.sml-file. . . . . . . . . . . . . . . . . . . 85

B.1 Option selection for the first test of WEKA’s SMO algorithm. . . 95B.2 Option selection for the first test of the naiveBayes algorithm. . . 95B.3 A screen shot of the test run with winnowing in c5.0. . . . . . . . 96B.4 A screen shot of the test run with winnowing and costs-file in c5.0. 96

5

List of Tables

10.1 SMO tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3710.2 KStar tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3810.3 Naive Bayes tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 3910.4 Random Forest tests . . . . . . . . . . . . . . . . . . . . . . . . . 40

11.1 C5.0 tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

12.1 WEKA - missing values . . . . . . . . . . . . . . . . . . . . . . . 47

14.1 SVM tests with cost . . . . . . . . . . . . . . . . . . . . . . . . . 5214.2 SVM tests with different kernels . . . . . . . . . . . . . . . . . . 5314.3 SVM tests with scale on and off . . . . . . . . . . . . . . . . . . . 53

15.1 naiveBayes tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 5915.2 Naive Bayesian algorithm with cross-validations . . . . . . . . . . 61

16.1 Tests of the random forest algorithm . . . . . . . . . . . . . . . . 65

17.1 Tests of RSNNS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

6

Part I

The automatic programmingoption

7

Chapter 1

Introduction to artificialevolution & automaticprogramming

1.1 Artificial evolutionModeling evolution on a computer may, according to Michael Negnevitsky, cre-ate intelligent behavior. He states that evolution usually is simulated based ona simple set of rules, and results in a series of optimization algorithms [41].

As in real life, evolution is a slow process. It models natural selection and/orgenetics. An umbrella term for these kind of algorithms is "evolutionary compu-tation". Evolutionary computation consists among others of genetic algorithms,evolution strategies and genetic programming. All these techniques simulateevolution by selection processes [41]. Short introductions to each of them followin the subsections below.

1.1.1 Genetic algorithmsA genetic algorithm is in general simulating natural evolution. The following,basic steps are included:

1. Creating a population of individuals [41].

2. Evaluating the fitness of the individuals [41].

3. Generate a new population through genetic operations [41].

4. Repeat the process until the fitness is better than a pre-set threshold [39].

The illustration shown in figure 1.1 presents the different parts of a geneticalgorithm. The genetic operations described in this figure are: Cross over, whichis an exchange of parts taken from two chromosomes. Mutation - which impliesthat values at random may be changed in for example a bit-string. But this isa probability-driven process [41].

8

Figure 1.1: A basic illustration of a genetic algorithm [41].

9

1.1.2 Evolution strategiesAccording to Negnevitsky [41], evolution strategies are much like genetic algo-rithms, but they only have mutation as a genetic operation.

Hansen, Arnold and Auger give a more detailed description of technique.They state that evolution strategies are search paradigms inspired by biologicalevolution. They further mention that this family of algorithms have a specificmethod for addressing problems. The evolution strategies implement a repeatedprocess of stochastic variations, followed by selection [42]. Stochastic variationsimplies the presence of a random variable [44], as in mutations.

An evolutionary strategy iterates in the following way: In each iteration newoffspring are generated. Then the new generation of offspring is evaluated. Thebest offspring then become parents for the next generation [42].

1.1.3 Genetic programmingGenetic programming is also quite similar to genetic algorithms (GA), describedin subsection 1.1.1. But the goals of this technique may be quite different fromGA’s goals. A GA may manipulate bit strings, but genetic programming is usedto create entire programs.

1.2 Automatic programmingAccording to Charles Rich and Richard C. Waters [55] automatic programminghas been one of the main goal in computer science since the first programs ofit’s kind was invented in the 1950’s. However, the automatic programs gotmore sophisticated in the 1960’s. Then the black box view of how automaticprogramming should function emerged. It was thought that these automaticprogramming tools should take the user’s needs as input, and output a programthat could fulfill these needs [5].

Some research areas for automatic programming have been [5]:

1. Natural language.

2. Formal specifications.

3. Input/output examples.

4. Traces.

One big drawback of automatic programming is, according to Akourdas andBringsjord, that it is difficult to create such programs. Therefore this fields’state of the art examples have not changed much the last 30 - 40 years [5].

There are two main methods in the field automatic programming; deductiveand inductive methods [5]. Deductive methods creates programs which satis-fies given specifications [55]. Inductive methods synthesizes logic rather thanfunctional programs, according to Akourdas and Bringsjord [5].

10

Chapter 2

Evolution in nature and incomputer science

In this chapter we have decided to explore different things that may cause evo-lution. In the first section we are mentioning evolution mechanisms. In thenext we examine if some of the techniques already have been implemented incomputer science by others. In the last section we discuss how and if any ofthe evolutionary mechanisms could have been used to create a program able ofclassifying poker hands.

2.1 Evolutionary mechanismsAccording to Freddy Bugge Christiansen and Tom Fenchell evolution is a processwhere living organism’s genetic pools are changing over time. It is a processwhere natural selection makes some organisms adapted to their environment.Not every adaptations are obvious, since some are sexual adaptations wheremales (usually) compete for females [20].

2.1.1 SelectionSelection may be carried out in various ways. Arora and Kania mention three,main categorizes [30]:

• Directional selection

• Stabilizing selection

• Disruptive selection

Directional selection happens when the most adapted phenotypes are themost likely to reproduce and survive within a population [30].

Stabilizing selection happens when the average phenotypes are the ones thatare most likely to survive and/or reproduce. Kania and Arora mention anexample with wintering sparrows. In the specific sparrow population the birdswith the average size are the most likely to survive through a harsh winter [30].

11

Disruptive selection is the opposite of stabilizing selection. This form of se-lection happens when the extremes are the most likely to reproduce (or survive)[30].

2.1.2 MutationsChristiansen and Fenchell state that mutations may cause evolution in a popu-lation. However, they point out that mutations generally have a negative effecton the species’ fitness. Only in some cases the mutations may be beneficial forthe individuals [20].

The authors also mention a theory of neutral, or nearly neutral evolutiondeveloped by Kimura. According to this theory most of the mutations thatare preserved in a population are neutral. They have no positive or negativeinfluence on the individuals’ fitness. Instead they generate a gray zone of neutralmutations, which will vary in breadth depending on the population size [20].

According to Christiansen and Fenchell some factors may cause an increasedmutation ratio within a population. One of these factors is inbreeding. Whenit comes to humans inbreeding may increase mutation ratio, and also disordersand the individuals’ resistance to diseases. But for some species of plants self-pollination is a successful evolution strategy. These plants breed with themselves, often in combination with foreign pollination [20].

Christiansen and Fenchell inform that favorable variations of mutations mayrapidly spread within a population, especially if the environment changes. Theymention a British butterfly species - naturally occurring with black and whitepatterns. This butterfly changed appearance in areas with a high degree ofpollution. In these areas the butterflies became black. The reason for this wasthat mutations causing black color were more favorable than white colors, sincethe pollution killed bright lichen on the trees where the butterflies lived. Brightbutterflies were therefore more vulnerable to predators in polluted areas [20].

2.1.3 Genetic driftAnother evolutionary mechanism is randomized genetic drift. This is a pro-cess where the genes change over time within a population. The genes of onegeneration therefore change a little from the generation before [20].

According this theory the genetic variations in a population may decreaseover time. But genetic drift is faster in a small population, than in a large [20].

2.1.4 Evolution within genesA counterpart to the theory describing genetic drift (ref. 2.1.3) was submitted inthe 60s. Then researchers, studying amino acids, found that molecular variationswithin a population was much higher than previously thought. They found thatregular mutation and selection couldn’t explain the number of variations. Thisobservation was shortly after explained with neutral evolution, described insubsection 2.1.2.

Another evolutionary mechanism that takes place within the genes is calledgenetic linkage. This effect may occur when chromosomes are attached to eachother, causing transmission of multiple properties [20].

12

2.1.5 Mass extinctionsIn the book Vertebrate Paleontology Michael J. Benton describes the historyand evolution of vertebrate animals. He explains how new species occurred,while others became extinct - sometimes under terrible circumstances.

Because during the last hundreds of millions of years several mass extinctionshave occurred, wiping out the majority of animal species alive. One of these,and probably the biggest of them all, took place at the end of the Permian era,before the age of the dinosaurs. Benton is describing several possible scenarioscausing this mass extinction. The most likely could be the Siberian traps. Thiswas a lava eruption in modern day Russia lasting perhaps 500.000 years. Thiseruption caused oxygen depletion and increased CO2 levels in the atmosphere -making the climate change drastically [9].

Common to all extinctions seems to be that the species that may handle anew environment, could thrive and possibly evolve into new species in the after-math. Benton is for example describing lystrosaurus, a dicynodont (a mammel-like reptile, which dominated in the Late Permian period) spreading world-widein the early Triassic [9].

Species with good-enough properties may therefore fill the gap after thenewly extinct species, taking their place in the ecosystem. The general theoryis therefore that mass-extinctions are followed by a period of rapid adaptationand diversification of species [43], making evolution leap forward.

2.2 What others have done

2.2.1 SelectionWhen it comes to selection all of the mechanisms mentioned in subsection 2.1.1have been tried before.

For example Kuo and Hwang describe a genetic algorithm with disruptiveselection. The algorithm’s fitness function is not only favoring superior individ-uals, but also inferior individuals. According to the researchers their approachspeeds up the genetic algorithm (GA) [25].

Stabilizing selection has been suggested by Rattray in 1995. He is usingstabilizing selection in a GA - trying to solve the subset sum problem, which isan optimization problem [51].

Lastly directional selection have for example been tested by Just Winfried& Zhu Fang [19].

2.2.2 MutationsMarsili Libelli and Alba describes a genetic algorithm with adaptive mutationimplemented. They suggest implementing mutation not as a constant value,but as a function. Mutation may therefore happen more often in individualsthat are less fitted than others [3].

According Christiansen and Fenchell inbreeding may occur in populations,generating more mutations in that specific group [20] (ref. mutations). Wi-bowo and Jamieson as published an article where they describe a mechanismfor recording ancestry, avoiding inbreeding [27].

13

Even the self-fertilizing strategy of plants have been tried out before. Wang,Liu and Yu have published an article where they suggest to use self-fertilizationas a technique in a genetic algorithm. Their results show that the algorithmmay be a promising technique to solve problems as the described universitytimetable problem [61].

2.2.3 Genetic driftGenetic drift is described in subsection 2.1.3, and is a force changing the genes ina population over time, decreasing variations. Dick and Whigham [16] describesa technique which makes it possible to identify the genetic drift in evolutionaryalgorithms.

2.2.4 Neutral evolutionGalván-López et al. has described the different aspects of neutrality in evolu-tionary algorithms. Although neutrality has been tested a lot in the evolution-ary computation community in the past years, implementation of neutrality hasproven to be both useful in some cases and unnecessary in others. As an examplea study published by Miller in 2000 concluded that neutrality could improve theperformance of boolean functions. Five years later Collins published an articlewhich claimed that Miller’s findings were wrong [35].

Neutrality has also been tested a lot with the use of different types of fitnesslandscapes [35].

An algorithm used to explore a neutral landscape is the Neutral RandomWalk algorithm. This algorithm starts with choosing a random solution, thenall neighbors to the solution are generated. A neutral neighbor is then selected- if the distance to the starting point is increased. The process is repeated untilno further distance may be increased from the starting point to a neighbor [35].

One system that has implemented neutral walks to avoid complex mutationsis the ADATE system, described in this part of the project. In the ADATEsystem neutral mutations make it possible to explore fitness plateaus and findspots from where it is easy to reach an even higher fitness plateau [45].

2.2.5 Genetic linkageYing-Ping [63] has proposed a method to implement genetic linkage in GAs.This algorithm is copying genetic linkage in nature, where two or more chromo-somes are attached to each other when cross-over is happening.

2.2.6 Mass extinctionsAccording to Jaworski et al. [28] imitations of natural extinction events maybe useful when creating evolutionary algorithms. The researchers from GdanskUniversity of Technology state that extinction techniques may be used to findnew local optimums in case of a fitness function change. They inform thatdrastic change in fitness function may be done to cause a extinction reaction.

Other researchers have also used mass extinction techniques in their work,with good results. Krink and Thomsen state that mass extinctions in natureseem to follow a power law - the bigger extinction impact, the more species die.

14

Krink and Thomsen have used the knowledge of extinction in their work andcreated self-organized criticality models which outperform basic evolutionaryalgorithms - both in speed and adaptations [31].

2.3 How evolution could have been implementedin a program classifying poker instances

Before we started to explore some of the evolutionary mechanisms found in na-ture, we didn’t expect that so many of them were already implemented andtested in computer science. We were surprised by the fact that mechanisms asself-fertilizing strategies already were implemented with claimed good results,since these kind of algorithms approves cross-overs involving the same individ-uals.

However, if we had more time and knowledge of evolutionary computation,we would like to implement our own genetic algorithm - possibly able of classi-fying poker instances. But since we lack experience in creating such algorithms,we will only point out the main features in this section.

We envisage a genetic algorithm, structured as the illustration in subsection1.1.1, but with multiple algorithms running in parallel instead of just one. Thegoal of the genetic algorithm would be to create an algorithm able of classifyingpoker hands, which come in ten different classes. We will therefore, in thiscase, start with ten genetic algorithms, each with a fitness function awardingindividuals able of classifying the most instances of one of the classes. Theselection process for the cross-over could therefore be a directional selection,since we reward individuals adapted to the current fitness landscape.

When a threshold is reached (enough instances have been classified, or astop criteria is triggered) we could stop the training of the genetic algorithms,one at a time. Then we could pair them together, and combine each of the GAsfitness landscapes - creating new fitness functions for each GA. This change offitness functions may resemble the extinctions mentioned by Jaworski et al. insubsection 2.2.6.

Hopefully the mass extinctions may trigger the creation of new individualsable of classifying two classes instead of just one. Lastly we could repeat theprocess until we at last could end up with individuals able of classifying all pokerhands. However, we don’t know if such an approach could work in practice. Anillustration of the idea is shown in figure 2.1

15

Figure 2.1: How we would like to create an algorithm containing several GAsfor classifying poker instances.

16

Chapter 3

Data set selection

In the first project we created decision trees and rules with the iris data setwhich all in all contains 150 instances. The data set we used in the previousproject contained in comparison 1.025.010 instances. In the second project wealso used C5.0 to create a couple of decision trees with the large data set ofpoker hands.

We experienced that we had to use much more time creating the decisiontrees with the large data set than the small. We also experienced that creatingneural networks was a time-consuming task. Some of our networks was createdduring 4-5 hours.

In this project we will use automatic programming to classify instances.According to Negnevitsky [41], this sort of programming is a slow process. Wewill therefore reduce the number of instances in our poker data set, to make itpossible to use automatic programming within a reasonable amount of time.

This down-scaling will be done with the same Java code we used in project2 to oversample and downscale the data set. We will also try a data set wherethe percentage of each instance is the same as in the original, but where thetotal number of instances are less than in the original set.

17

Chapter 4

Specification file for ADATE,how the data is coded anddescription of user callbacks

In this chapter we have tested the automatic programming system ADATE,developed by Østfold University College’s Roland Olsson. We have followed theADATE user manual written by Geir Vattekar to accomplish our task [60].

The files created and modifications done in this part are tests of how theADATE system works.

4.1 Creation of a specification fileBefore creating a specification file we transformed C5.0-files to a .spec file byusing the "c5conv" program. This process is shown in figure A.1 in section A.2.

The generated .spec file contains two parts - the ADATE-ML code beforethe "%%" delimiter, and then the plain ML-part of the specification [60]. TheADATE-ML code is shown in figure A.2 in section A.2.

4.2 How the data is codedThe data set used for this first test of the ADATE system is the smallest of thepoker hand data sets available at UCI Machine Learning Repository [52]. thedata set is not converted to binary numbers, since we wanted to test if it waspossible to use larger numbers. The .names and .data files has been generatedin the same way we did for the C5.0 decision tree test in project 2.

4.3 Relevant user callbacksAccording to Geir Vattekar there are ten types of user callbacks in the ML-partof the specification file; Inputs, Validation_inputs, Abstract_types, Funs_to_use,Reject_funs, Restore_transform, Grade, Output_eval_fun, Max_output_genus_cardand finally Max_time_limit & Time_limit_base [60].

18

In the ML-part of our file "poker.spec" (after the delimiter), a number ofuser callbacks are declared.

The first, and biggest part of the ML-section in the specification file, istraining and validation inputs & outputs. Then other user callbacks are listed.One of the first of these are Funs_to_use:

val Funs_to_use = ["false", "true","realLess", "realAdd", "realSubtract", "realMultiply","realDivide", "sigmoid","tor", "rconstLess","class0", "class1", "class2", "class3", "class4", "class5", "class6","class7", "class8", "class9"]

According to Vattekar this is the file’s function set. These are functions thatADATE could try to use in the synthesized code [60].

In the specification file a list Reject_funs is empty:

val Reject_funs = []

This list could contain functions which ADATE could call during synthe-sizes of expressions. The purpose of this list is to make sure that semanticallyequivalent expressions not are used [60].

Another user callback is grades, which are organized in a structure:

structure Grade : GRADE =struct

type grade = unitval zero = ()val op+ = fn(_,_) => ()val comparisons = [ fn _ => EQUAL ]val toString = fn _ => ""val fromString = fn _ => SOME()

val pack = fn _ => ""val unpack = fn _ =>()

val post_process = fn _ => ()

val toRealOpt = NONE

end

The ten different values in a grade is used to calculate an individual’s eval-uation value [60].

The user callback Max_output_genus_card defines have many individualsthat should exist in an output genus [60]. In our specification file the value isset to 4:

val Max_output_genus_card = 4

19

Geir Vattekar states that the value is a heuristic choice, and that the effectsof changing it isn’t well understood [60]. We will therefore test how changes ofthis value affects the prediction model, later in this part of the project.

Another user callbacks are Max_time_limit and Time_limit_base. In ourfile they are written as follows:

val Max_time_limit = 1024val Time_limit_base = 1024.0

These values determines how long time an individual is allowed to execute.Max_time_limit is the maximum time complexity an individual may use withany of the examples. [60].

We could have mentioned more user callbacks, but have chosen to draw theline here.

20

Chapter 5

Log, trace and validation, aswell as an analysis of possibleoverfitting

At this point in the project, our lecturer Roland Olsson has tested the poker dataset on the college’s cluster. The group will from now and onwards concentrateour work on these files, and not the ones generated in chapter 4.

Some information appear similar in all files. Such information is the basicdescription of each individual (each program generated) in the files [60]. Anoutput of an individual in the log-file is shown below:

3072 0 3768 13063366 723545 [ 249.201223776, 246.738259839, 263.766710701 ] 15D8A11E7FF2EF7 ~0.491199440649 2

Each part of the set of numbers above means something special, and all in allgives a description of the individual’s performance. The descriptions (writtenbelow) of the line are obtained from the ADATE manual [60].

3072 is the number of instances which were correctly solved by the individualprogram.

The next number in the line is describing the number of instances whichtook too much time or memory to be solved - in this case 0.

The third number is the number of instances which weren’t solved correctly.3768, in the case above.

Since the example above doesn’t have the same amount of numbers in frontof the integer list as the example shown in the ADATE manual, we don’t knowthe meaning of the next, two numbers.

The integer list in the middle of the line contains syntactic complexity mea-sures in bits.

The hexadecimal number after the array is a semantic fingerprint used toidentify individuals semantically equal to the current individual.

The decimal after the hexadecimal number is a syntactic fingerprint identi-fying the current individual.

The last number 2 is the number of super-combinator functions.

21

5.1 The log fileAccording to Geir Vattekar the log file prints information about the individuals.These individuals are generated during a run and inserted into the kingdom[60]. In this section we will try to point out and describe important parts of thelog file generated at the college’s cluster.

The log file contains a huge number of individuals generated during therun. The information about one of the individuals is presented in the appendix’subsection A.1.1. Some information from the file’s excerpt is reprinted below:

...

----------------------------------------------------------------------Individual id = 0_32D867 Trace info = embedding for 0_1F03C7 0

...

Ancestor fps = [ ~0.491199440649, ... , 5.16632293668E~4 ]

...

3072 0 3768 13063366 723545 [ 249.201223776, 246.738259839, 263.766710701 ]15D8A11E7FF2EF7 ~0.491199440649 2

Time limit = 65536

Test eval value =2338 0 3595 0 536248 [ 249.201223776, 246.738259839, 263.766710701 ]19B569B3B8B1CE5 ~0.491199440649 2

fun f Xs =case Xs of

nill => (raise NA_A05C3)| cons( VA05C4 as card( VA05C5, VA05C6 ), VA05C7 ) =>case VA05C7 of

nill => flush| cons( VA05C8 as card( VA05C9, VA05CA ), VA05CB ) =>case ( VA05C9 = VA05C5 ) of

...

| true => f( cons( VA05C4, VA05CB ) )

...

Creation time = 665305.534982

...

The information above is an excerpt of the information given about an indi-

22

vidual in the .log-file. The different fields are explained in the bullet list below(the ADATE manual has been used to interpret the information [60]).

• Individual id: The individual’s unique identifier, in this case ’0_32D867’.

• Ancestor fps: A list containing the individual’s ancestors in the kingdom.

• Information about training of ’0.0491199440649’: This is basic informationabout the training set. The information about the training is given in thesame way as explained in section

• Time limit: This is the time limit of the individual, in this case 65536.

• Test eval value: Contains basic information about the test.

• fun f: This is the program’s ADATE-ML code.

• Creation time: This is the creation time in CPU time seconds. In our casethe creation time may be misleading, since ADATE was run on a cluster.

5.2 The trace fileThe .trace-file has information about the kingdom. It also displays the time ittook to perform tasks on the individuals, the time complexity grid and whichindividuals that are queued for breeding [60].

According to Geir Vattekar the statistics in the file are mostly for advancedusers, as well as for debug information. But three time measures are mentionedin the ADATE manual; global time, no of evaluations & cumulative eval time[60].

In the .trace file these values are written as follows:

...Global time = 19042.134087...No of evaluations = 417Cumulative eval time = 19.013177...

The global time is how long time it has taken to generate the kingdom. ’Noof evaluations’ is the total number of performed evaluations. Lastly ’cumulativeeval time’ is cumulative time that has been for evaluation value calculations[60].

After the initial statistics, grids are listed. These are time-complexity-gridswith the kingdom’s individuals. Geir Vattekar recommends to use the gridpe1g0s0, since this grid gives highest priority to the number of correct answers[60]. The pe1g0s0-grid from the .trace file is listed in subsection A.1.2.

An excerpt of the pe1g0s0 information is shown below.

Grid for pe1g0s0:

Column for time limit 65536

23

0 0 0 9223372036854775806 27360 [ 5, 5, 4.975 ]A09D39708D7132 5.16632293668E~4 1...5419 0 186 9223372036854775807 5560992 [ 573.808022472, 552.984478204, 582.235077609 ]23C88ACA4346337 0.0183619111125

The information about the pe1g0s0 grid seems different from the one de-scribed in the ADATE manual. Some of the content is however described,regardless of the differences.

The grid’s identifier, in this case pe1g0s0, consists of three parts. The firstpart, ’pe1’ specifies which program evaluation function that has been used. thesecond part ’g0’ specifies the grade measure. Then the last part specifies thesyntactic complexity measure [60].

Then the line ’column for time limit’, which in our case is 65536, specifiesthe order of which the individuals will be listed in the time-size grid’s trace-file.

The basic information is then shown about the individual. At last the grid’sindividual with the highest training evaluation value is printed.

5.3 The validation fileIn our folder there are no files with ’.validation’ as file ending. Instead thereare test-files. These files contain the same information as the validation-filesdescribed by Geir Vattekar in the ADATE manual[60]. We are therefore treatingthese test-files as the described validation-files.

The validation files are mainly used for test the generalizing ability of theclassifiers. The individuals presented in the file are sorted on training evaluationvalue, and the best individual is presented at last in the file [60].

In our case the a test-file is containing the described data. The excerpt belowis copied from that specific file:

...Time = 1731236389.47...5882 0 51 0 4158530 [ 548.207836085, 536.341822949, 557.318491885 ]5EEFA8BEE9D372D 0.276020310308 1

The line at the bottom shows the individual which performed best.

5.4 Analysis of possible overfittingWe have searched for the individual performing best (shown in subsection 5.3)in the log-file. The following information was given about the specific individualin the log:

Individual id = 0_664571 Trace info = embedding for 0_664561 0Individual fp = 0.276020310308

Ancestor fps = [ 0.276020310308, ... , ~0.0508444643506 ]

24

Max cost limit chosen = 0

Max cost limit done = 06725 0 115 939327 5499844 [ 548.207836085, 536.341822949,557.318491885 ] 100BC42B708B90C 0.276020310308 1

Time limit = 65536Test eval value =5882 0 51 0 4158530 [ 548.207836085, 536.341822949, 557.318491885 ]5EEFA8BEE9D372D 0.276020310308 1

6725 trainAndTest 5882 548.207836085 0.276020310308

fun f Xs =...end

Local trf history =

REQ

Top poses = [ [ ] ]Bottom poses = [ ]Bottom labels =Synted exp = ___gaNot activated symbols = [ ]

CASE-DIST [ 2, 0, 2 ] [ 2, 0 ]................

Global trf history =

Creation time = 217092.15828

We can’t see any signs of overfitting since the individual performed betteron the validation than on the training data set. Of the 6.840 instances in thetraining set, the individual classified 6.725 correctly, and 115 incorrectly. Whentested the individual classified 5.882 correctly and 51 wrong. The percentage oferrors for the training set was 1.7%. The percentage for the validation/test was0.9%.

25

Chapter 6

The program with the bestvalidation value

In section 5.4 we present an excerpt of the program with the best validation(test) results. The complete program is presented in the appendix’ subsectionA.1.3. In this chapter we have tried to explain how it works, by decoding thecontent of the f-function. This work is shown in section 6.1.

6.1 Interpretation of the programThe program that gives the best validation results is an advanced recursiveprogram/function. The program is too advanced and we did not manage togive a good interpretation of it.

26

Chapter 7

Test of the ADATE program

In this chapter we present a test of the best program generated by ADATE toclassify poker hands. The program was modified by our lecturer Roland Olssonto make it possible to test it with the complete poker data set, containing1.025.010 instances.

Figure 7.1: Test of Adate’s best program with the complete data set containing1.025.010 instances.

Figure 7.1 shows the command used to run the program and the resultsof the test. The test showed that the ADATE program managed to classify1.019.926 instances, which corresponds to an error percentage of approximately0.5%. This error percentage is lower than for the deep neural network trainedwith the same data set in project 2 (this one managed 0.7%).

27

Part II

The WEKA alternative

28

Chapter 8

Available machine learningalgorithms in WEKA

There are several different classification schemes in WEKA. These are organizedinto eight classes; bayes, functions, trees, lazy, rules, meta, multi-instance andmiscellaneous [47]. In total there are 76 classification and regression algorithmsin WEKA [6].

The manual describing WEKA version 3.6.10 lists a number of differentclassifiers available in the system. These are, among others:

1. bayes.NaiveBayes: A bayesian learner [18].

2. functions.SMO: This is WEKA’s implementation of a support vector ma-chine [18].

3. lazy.KStar: This is an instance based learner [18].

4. tress.J48: WEKA’s implementation of the C4.5 algorithm. This is a deci-sion tree learner [18], which we have tried in project 1.

29

Chapter 9

Algorithm selection anddescriptions

We have chosen three of the four algorithms mentioned in chapter 8; the supportvector machine, the bayesian learner and the KStar learner. Brief descriptionsof each of the algorithms follow in this chapter.

9.1 SMO

9.1.1 Support vector machines generallySupport vector machines (SVM) are classifiers which output optimal hyper-planes to categorize examples [15]. A hyperplane is a subdimension of a higherdimension. In a one-dimensional space a hyperplane is a point. In a two-dimensional space, a hyperplane is a line, and so on [7].

In a two dimensional space a hyperplane may be used to separate differentclasses of points from each other. A hyperplane in the form of a line may fulfillthis purpose. To make sure that SVM outputs an optimal hyperplane, we maydefine a criterion to estimate the worth of different hyperplanes categorizingdata [15].

Support vectors are the instances which are located closest to the hyperplane[15].

9.1.2 WEKA’s implementation of a SVMWEKA’s class SMO is used to train support vector machines. The specificalgorithm implements John Platt’s sequential minimal optimization algorithm[59]. This algorithm is described in the article "Fast Training of Support VectorMachines Using Sequential Minimal Optimalization" by John C. Platt himself[48].

The SMO algorithm was developed to be an effective alternative to thestandard SVMs at the current time. While some SVM algorithms require a lotof computation time - scaling between linear and cubic - the SMO algorithm wasdesigned to work faster. According to Platt the algorithm is scaling between

30

quadric and linear, and breaks large problems into series of smaller problems[48].

The SMO algorithm has three major components; a heuristic method to findwhich multipliers to optimize, an analytic method to solve Lagrange multipli-ers and finally different acceleration techniques [48]. The Lagrange multipliermethod is used for finding optimums in spaces with more than two dimensions.This technique is using partial derivatives to arrive at a solution [23].

9.1.3 SMO options in WEKAWEKA has implemented a series of different options to adjust their SMO algo-rithm. The options are as follows (the options specific for PolyKernal may bedifferent from the ones listed below) [59]:

1. -D: Toggles debug mode which makes the algorithm output additional infowhile running [59].

2. -no-checks: Turns off checks for missing values, etc [59].

3. -C: This is the complexity constant [59]. This constant is controlling thenumber of support vectors (ref. 9.1.1) which may be used when trainingthe current SVM [21].

4. -N: Activates normalization, standardization, or neither of them [59]. Nor-malization/standarization may be performed on both the input- and thekernel level. Normalization may imply normalizing each feature to a unit-vector. Standardization implies scaling different ranges of possible valuesto a common range [8].

5. -L: This is the tolerance parameter [59]. The tolerance parameter is usedto specify the maximum gradient of the quadratic function. The trainingis terminated when the gradient value is less than or equal to this value[26].

6. -P: Is the epsilon for round-off error [59]. The epsilon value is used toseparate small errors from large errors [40].

7. -M: Is used to fit logistic models to SVM outputs [59].

8. -V: Specifies the number of folds for the internal cross-validation. Thedefault number is -1, which means that cross-validation is not active [59].

9. -W: This parameter is the random number seed [59].

10. -K: Specifies which kernel to use [59].

9.1.4 Two, simple tests of the SMO algorithmTo check if WEKA’s SMO algorithm works we performed two simple tests witharff-files containing approximately 25.000 instances of poker data. The testsdidn’t include all 1.025.010 instances used in project 2. Why we chose to usesmaller data sets is explained in chapter 3. Since these were only simple testswe will not consider the number of instances of the different classes, and if thedata sets should be oversampled and scaled down, as we did in project 2.

31

The difference between the tests is that the first was done with only elevenattributes - ten inputs and one target class. In the next test we used 85 inputsand one target class. This was done to check if it is necessary to convert theinput data before creating support vector machines in WEKA. The Java codefor creating the .arff file containing 85 inputs is not shown in this project. Thecode was developed during the second project, and is shown in that document.

For both test we decided to turn off the check for missing values, etc., since weknow that the data sets have no missing values. We turned on cross-validation- mainly to test the functionality (we chose a 10-fold cross-validation). We alsochose to standardize the data in the first test (we did the same operation in thesecond test, to make it possible to compare the results). Lastly we turned ondebug mode, to get as much output as possible, since this tests were the firstfor this algorithm. The adjustments are shown in figure B.1 in section B.3.

The information outputted after the first test is shown in subsection B.1.1.This information states that the error percentage in the first test was more than53 percent.

The model building of the second test took much more time than the sameprocedure during the first. In the first test the model was created in 197 seconds.The second time the creation took 23.047 seconds, or 6 hours 24 minutes and 7seconds.

The creation of the whole predictive model, with 85 inputs and 10 fold cross-validation, took approximately 25 hours. The results are shown in subsectionB.1.2. the error percentage for the second test was nearly 54 percent - a littlehigher than for the first test, although it took much more time to create thesecond SVM. However, the second SVM tried to classify more classes than thefirst SVM. The second SVM tried to classify the instances into five classes, whilethe first only classified the instances as two classes (this information is presentedin the confusion matrices in subsection B.1.1 and B.1.2).

One reason for the poor results may be that we haven’t chosen to adjustthe complexity constant, which may increase the number of possible supportvectors. Different values for this constant will be tested later in this part of theproject.

9.2 NaiveBayes

9.2.1 Bayesian classifiers in generalThe bayesian classifiers are all built on an idea that members of classes sharesimilar values for some features. These classes are called natural kinds [50].

A bayesian classifier may be used to predict the class of an instance if theclass is not known, but some of the features are. To accomplish this, the classifierbuilds a probabilistic model, and uses Baye’s Rule to make predictions [50].

Baye’s Rule makes sure that the probabilistic model is updated when newevidence is presented. The specifies how, and which parts of the model, thatshould be updated in proposition to the new evidence [49].

32

9.2.2 WEKA’s naive bayesian classifierWEKA has implemented several bayesian classifiers - one of them is the naivebayesian classifier called NaiveBayes. This classifier is implemented as de-scribed by John & Langley in the article "Estimating Continuous Distributionsin Bayesian Classifiers" - published in 1995 [58].

According to the authors the naive bayesian classifier is a simple approachwith clear semantics. It is suitable to predict classes of test instances if thetraining instances contain class information [29].

The technique is called "naive" since it relies on a couple of simplifyingassumptions. It assumes that the predictive attributes are conditionally in-dependent, given a class. It also assumes that no latent or hidden attributesinfluence the predictive model [29].

9.2.3 Naive bayesian options in WEKAIn WEKA the naive bayesian classifier may be adjusted with three differentoptions [58]:

1. -K: This option makes it possible to use kernel density estimator insteadof normal distribution for numeric attributes [58]. A kernel density es-timator may be seen as a smooth line following the different bins in ahistogram. This line is drawn based on the observations of the data, andtakes uncertainty into account [14].

2. -D: Enables supervised discretization to process numeric attributes [58].Discretization is a process where a range of attributes are divided into re-gions. These regions may then replace data values in the predictive model.The supervised discretizations explore class information, when the unsu-pervised do not. Evaluations show that implementation of discretizationtechniques may improve the classifiers predictive accuracy, and speed uptree-based methods [2].

3. -O: Enables displaying with an old format, which may be good for viewingmany classes at once [58].

9.2.4 Three tests of the naive bayesian algorithmWe have tested WEKA’s naive bayesian classifier. In the tests we have usedthe .arff-file containing 25.010 data instances, but with only ten attributes. Alltests were performed with 10 fold cross-validation.

In the first test the kernel density estimator and supervised discretizationare disabled. However, supervised discretization is enabled in the second test.In the third test the kernel density estimator is enabled.

The first test finished in two or three seconds, and outputted the informationshown in subsection B.1.3. The test information shows that all the instanceswere classified as class 1, resulting in an accuracy of about 50%. Figure B.2,displaying the adjustments made to the bayesian classifier, is presented in sectionB.3.

The second test performed almost similar to the first test. It used a coupleof seconds to create a predictive model, which classified all the instances as class1. The information output is presented in subsection B.1.4.

33

As the two, previous tests the third test (shown in subsection B.1.5) finishedin a couple of seconds. But unlike the others it tried to classify instances of class2. Although it managed to classify 758 of them correctly, the algorithm per-formed all in all a little poorer than the others. The third test’s error percentagewas approximately 51%.

9.3 KStarKStar is an instances based classifier, that is the class of a test instance is basedupon the class of those training instances similar to it, as determined by somesimilarity function [57].

9.3.1 Instance-Based learnersInstance-based learners (short IBL) like the KStar, classify an instance by com-paring it to a database of pre-classified examples/instances. The assumption isthat similar instances will also have similar classification. For Instance-basedlearners the distance function determines how similar two instances are. Were asthe classification function specifies how instances similarities yield a final classi-fication for the new instance. Additionally to those to functions, the IBL algo-rithms have a concept description updater, which determines if new instancesshould be placed in the database and which of the instances in the databaseshould be used for classification. This depends on the IBL, if you work with asimple IBL algorithm it always moves the instances to the database, but morecomplex algorithms filters the instances to help improve tolerance to noisy dataand to reduce the storage requirements [57].

9.3.2 KStar optionsWeka has some options that can be changed, the list below shows these optionsand what they are used for. An option which is not listed below but present inWeka is an option for debug information during the training, this can be set totrue or false.

1. -B (num): Manual blend setting (default is 20%)

2. - E: Enable entropic auto-blend setting (symbolic class only)

3. -M (char): Specify the missing value treatment mode (default a). Validoptions are: a(verage), d(elete), m(axdiff), n(ormal). As mentioned beforethis would not be taken in consideration since we know that there are nomissing values in the data set.

9.3.3 Weka test runsTo test the KStar classifier in Weka we used the data set with 25 010 instancesand 10 attributes. We tested the KStar classifier by running a 10 fold cross val-idation. The test managed to correctly classify 13 119 (52.45 %) instances, andincorrectly classify 11 891 (47.545 %) instances. In a second test we increasedthe number of folds from 10 to 20, but we gained similar results as the one with10 folds.

34

In a third test we used the percentage split option in weka, were 70 % wasused for training and the remaining is used for testing. With percentage splitthe KStar managed to classify 3 900 (51.98 %) of the test instances correctly. 3603 (48.02 %) instances of the test set were classified incorrectly.

35

Chapter 10

WEKA tests

10.1 SMO testsIn this section we have tested WEKA’s SMO algorithm with different optionadjustments. We have tested normalization (and similar techniques), complexityconstants and cross validation.

The tests of the SVM in R in section 14.2 was performed before these tests ofthe SMO algorithm. In the R tests we learned that combining different successfuloptions in a final test may lead to poor results. We have therefore decided totest the SMO algorithm a bit differently. Instead of combining options in a finalstage, we have tested different options together.

All the SVMs have been trained with a data set containing 10 input valuesand 25.010 instances. We didn’t use the data set with 85 inputs, because of thehuge time consumption - earlier described in subsection 9.1.4.

In the tests the cross validation values varied between 2, 5 and 10. Thecomplexity constant was been set to 1, 5 and 10. Finally the N-value has beenset to both ’normalization’ & ’standardization’.

In the table the column showing time consumption presents the time ittook to build the SVM models, which is displayed in WEKA. However, thetraining with cross-validation is not shown. In some cases this training tookmuch more time than the initial model building. For example when we traineda SVM with normalization, the complexity constant set to 10 and 10 fold cross-validation, the initial training took about ten minutes. The cross-validationsession however, finished after about two hours of training. The column showingtime consumption may therefore be a bit misleading (ref. table 10.1). A columnapprox. time has been added to show approximately time used in hours.

Explanations - table 10.1

1. id: The row’s id.

2. cv: Cross-validation folds.

3. cc: Complexity constant.

4. n/s: normalization/standardization.

36

5. t: Time consumption - model.

6. at: Approximate total time consumption in hours.

7. err: Error percentage.

8. cp: The number of different classes predicted during the run.

Table 10.1: SMO tests

id cv cc n/s t at err cp1 2 1.0 n 00:00:37 < 1h 50.0% 12 5 1.0 n 00:00:32 < 1h 50.0% 13 10 1.0 n 00:00:32 < 1h 50.0% 14 2 5.0 n 00:04:02 < 1h 50.0% 15 5 5.0 n 00:04:05 < 1h 50.0% 16 10 5.0 n 00:04:05 < 1h 50.5% 27 2 10.0 n 00:08:42 < 1h 50.0% 18 5 10.0 n 00:09:02 1h 50.0% 19 10 10.0 n 00:09:27 2h 50.0% 110 2 1.0 s 00:05:14 < 1h 53.4% 211 5 1.0 s 00:06:05 < 1h 54.0% 212 10 1.0 s 00:04:54 < 1h 50.0% 113 2 5.0 s 00:54:43 2h 50.0% 114 5 5.0 s 00:48:11 4h 50.0% 115 10 5.0 s 00:47:12 7h 50.0% 116 2 10.0 s 01:31:45 2h 50.0% 117 5 10.0 s 01:30:54 9h 50.0% 118 10 10.0 s 01:43:50 17h 50.0% 1

A table showing information of tests performed with the SMO algorithm.

Table 10.1 contains the results of the tests with the SMO algorithm. Almostevery classifier generated was able to classify the instances as only one class zero- resulting in an error rate of about 50 percent.

10.2 KStar testsThe KStar algorithm does not contains as many options as the SMO algorithm.In chapter 6.3.2 we mentioned the options that are available in WEKA. For thetests in this chapter, we used cross validation with 5, 10 and 20 folds and settingthe manual blend to 0, 20 and 50. The entropic auto blend was set to false forall tests, since we gained better results this way. We used the data set with 25010 instances with 10 input values.




37

3. gb: Manual blend constant.





Table 10.2: KStar tests

id cv gb t at err cp1 5 0 00:00:00 < 1h 48.97% 52 10 0 00:00:00 < 1h 48.79% 53 20 0 00:00:00 < 1h 48.64% 54 5 20 00:00:00 < 1h 47.64% 65 10 20 00:00:00 < 1h 47.54 % 66 20 20 00:00:00 < 1h 47.41% 67 5 50 00:00:00 < 1h 43.93% 38 10 50 00:00:00 < 1h 43.79% 39 20 50 00:00:02 < 1h 43.61% 3

A table showing information of tests performed with the KStar algorithm.

The table 10.2 shows that the percentage for incorrectly classified instancesare lower with a global blend of 50. But the number of different classes thatit manages to classify is lower. With the global blend set to 20, the classifiermanaged to classify 6 different classes. It should be noted that it only managedto classify few instances from the class 3 to 6, Most of the instances that areclassified correctly are from class 1 and 2 (nothing and one pair). This is also thecase with the global blend set to 50, were it manages to classify more instancesfrom class 1 and 2 correctly, resulting in a lower error percentage.

10.3 Naive Bayes testsWe also choose to do some more advanced tests with the Naive Bayes classifier.In chapter 6.2.3 we have explained which options are available in Weka. All ofthe options can either be set to true or false. The options that gives differentresults is the “useKernelEstimator” option. By setting this option to false it onlymanages to classify instances from class 1 and all the other classes are classifiedas class 1 as well. Setting this option to true it manages to classify instancesfrom class 1 and 2, but none for the remaining classes.

For these tests we have used the data set with 25 010 instances with 10inputs. The tests were performed with 5, 10 and 20 folds cross validation. Wealso set the "useKernelEstimator" option to true or false, since this was the onlyoption that gave us some different results.

38




3. ke: Kernel estimator, either true or false.





Table 10.3: Naive Bayes tests

id cv ke t at err cp1 5 false 0.12 seconds < 1min 50.04% 12 10 false 0.05 seconds < 1min 50.05% 13 20 false 0.05 seconds < 1min 50.05% 14 5 true 0.16 seconds < 1min 50.99% 25 10 true 0.16 seconds < 1min 50.93 % 26 20 true 0.05 seconds < 1min 50.75% 2

A table showing information of tests performed with the Naive Bayes algorithm.

Later in the paper we have conducted some tests with the Naive Bayes inR, which have more options available.

10.4 Random Forest tests

10.4.1 Initial testWe ran some initial test to determine if we should use the data set with 85 or10 inputs. We sat the number of trees to 10 and tested with both the data sets.The result we gained showed that the data set with 85 inputs managed to get alower percentage for incorrectly classified instances. 42.6 % with 85 inputs and43.3 % with 10 inputs. Both sets mostly classified instances from class 1 and2. The set with 85 inputs and 25 010 instances is also used later chapter 14,where we test the random forest classifier in R. Therefore we will use the dataset with 25 010 instances and 85 inputs, to do some advanced tests.

10.4.2 Advanced testsWeka has four options for the random forest classifier that are adjustable. Theoptions are listed below.

1. maxDepth, which is the maximum depth for the tree, 0 means it’s unlim-ited

39

2. numFeatures, the number of attributes which are going to be used in therandom selection.

3. numTrees, the number of trees that are going to be built.

4. seed, the random number seed to be used.

We performed the tests in chapter 14 before doing the tests in Weka, and wewanted to use the same criteria in both tests. But we soon figured out that Wekacould not handle to build 500 trees and it often had trouble generating 100 treesas well. The problem was a memory error. We experienced this problem bothwith the 10 and 85 input data sets. We therefore choose to lower the number oftrees generated and continued our testing. The data split was set to 70 % fortraining and 30 % for testing.



2. nt: max trees.

3. nf: number of features.




Table 10.4: Random Forest tests

id nt nf t err cp1 10 0 7.81 sec 42.62 % 42 20 0 49.93 sec 40.84 % 33 50 0 107.12 sec 38.26 % 24 100 0 62.86 sec 37.16 % 25 10 40 126.03 sec 37.94 % 46 20 40 198.71 sec 35.24 % 57 50 40 369.62 sec 30.19 % 58 100 40 298.42 sec 28.10 % 49 10 85 72.05 sec 36.80 % 410 20 85 140.47 sec 33.89 % 411 50 85 358.17 sec 29.44 % 512 100 85 719.15 sec 27.19 % 5

A table showing information of tests performed with the Random Forest algorithm.

We experienced the same kind of results with random forest as we did withthe KStar classifier, that it managed mostly to classify instances from the firstand second class. When it managed to classify instances from other classes, itwas only a few, most the time between 1 and 10 instances were classified andthese were from class 3 and 4.

40

Chapter 11

Test of tools and comparisonof results

In this chapter we have been asked to compare the techniques presented in thispart with each other. A part of the task is also to compare the techniques withearlier tests with neural nets, Cubist and C5.0. Since we haven’t performedmore than a couple of tests with C5.0, and none with Cubist, with the pokerset, we have decided to test these tools more extensively before we compare theruns with the WEKA tests. We started by testing the Cubist tool, which is aregression tool, and probably not suitable for our task. We have then performedtests with C5.0 and some of the options available in this tool. Finally we havecompared all the results with each other, which is this chapter’s main task.

11.1 Cubist testThe Cubist test was performed in Linux, since the Windows-version of theprogram is a demo which doesn’t permit a run with the full data set. Oracle’sVirtual Box with Ubuntu installed was used for this purpose. The data set inthe test contained 1.025.010 instances.

Cubist was downloaded from Rulequest Research’s site [53], and an exe-cutable file was created by calling the make-command, as presented in figure11.1.

We tried to classify the instances by first changing the line describing theclass in the names-file to the following:

class: continuous. | 0: nothing, 1: onePair, 2: twoPairs,3: threeOfAKind, 4: straight, 5: flush, 6: fullHouse,7: fourOfAKind, 8: straightFlush, 9: royalFlush

Then we tested the tool by calling the command:

./cubist -f poker

Figure 11.2 shows examples on rules generated by Cubist. The rules are notclassifying the instances by outputting even numbers. Instead the tool outputsdecimal numbers. It is also stated at the tool’s site that Cubist is meant as

41

Figure 11.1: A test of the data mining tool Cubist.

a complimentary tool to C5.0, and outputs values instead of classes[53]. Wetherefore abandoned the tool.

Figure 11.2: An example of rules generated by Cubist.

11.2 C5.0 testsIn the first project we tested the Windows-version of C5.0, See5, with the Irisdata set. In project 2 we had selected a new data set, and had only performedsome simple tests of the C5.0-tool in Linux. In project 2’s chapter 12 we presentour tests with the C5.0 and 10-fold cross-validation - with and without winnow-ing.

However, it would not be fair to compare the results from the other toolswith C5.0, if we didn’t perform any more tests. In this section we have testedmore options and have used a test set to validate the results. The test set wasa data set containing 25.010 instances, which is a separate set at UCI MachineLearning Repository. The data set containing 1.000.0000 instances was used for

42

the training part (when cross validation was enabled, the data set’s test set wasabandoned in favor of the different folds).

To adjust the available options we have based our choices on a survey byPatil and Bichkar [46]. The options have been selected from the overview of thealgorithm presented at Rule Quest’s site [54].

Initially we have performed tests with or without pruning, boosting andsoftening. Cross-validation was set to none and 10. The initial runs are shownin table 11.1.

The following bullet list explains the names used for table columns:

• id: The id of the current test.

• p: ’y’ if pruning is enabled. ’n’ if not. The option ’-g’ was set to disableglobal pruning.

• b: ’y’ if boosting is enabled. ’n’ if not. The option ’-b’ was set to enableboosting.

• s: ’s’ if softening is enabled. ’n’ if not. The option ’-p’ was set to enablesoftening.

• cv: The number of cross-validations. The option ’-X 10’ was set to enable10-fold cross validations.

• t: Time consumed in the format; "hh:mm:ss".

• e: Error percentage for the classification of the test set.

• cp: The number of classes predicted by the current classifier.

In this section we have also tested the combination of boosting and crossvalidations, although this may be unnecessary time-consuming. The reason isthat the cross-validation finally outputs the median error rate for the differentfolds.

All option selections are shown in figure 11.3 .

Figure 11.3: Option selections when running C5.0 in terminal.

43

Table 11.1: C5.0 tests

id p b s cv t e cp1 n n n none 00:00:47 25.2 82 n n n 10 00:09:06 26.3 83 y n n none 00:00:50 25.2 84 y n n 10 00:09:50 25.9 85 n y n none 00:06.29 15.7 76 n y n 10 01:26:17 16.3 77 y y n none 00:08:24 15.6 78 y y n 10 01:09:06 16.4 79 n n y none 00:01:14 24.6 810 n n y 10 00:12:45 25.7 811 y n y none 00:00:57 24.6 812 y n y 10 00:12:13 25.3 813 n y y none 00:09:40 14.6 714 n y y 10 01:19:49 15.9 715 y y y none 00:09:19 15.2 716 y y y 10 01:58:48 16.0 7

A table with tests performed with C5.0.

Winnowing was not chosen to be among the options, since the technique hasbeen tested in project 2, with somewhat unsatisfactory results. The test withwinnowing performed better than without, but winnowing omitted all the typeattributes - making it impossible to classify hands with flush, straight flushand royal flush. Nevertheless, we have performed a simple test with enabledwinnowing and the options for the best test in table 11.1 (with id 13).

The following command was written in the terminal:

./c5.0 -f poker -w -b -p -g

The test’s output is shown in figure B.3 in section B.3. As earlier stated thistest winnowed all the type attributes, but the error percentage was as low as10.2 percent for the test set.

To avoid all the classifier’s mistakes when classifying the rarest classes, wethen tried to make it less beneficial classifying incorrectly. To achieve our goalwe created a costs-file.

Since there are decreasingly fewer instances of the rarest classes, we createda costs-file with increasing cost for the rarer the class. The file was organizedin the following way:

1, 0: 12, 0: 1...6, 9: 107, 9: 109, 9: 10

Although the rarest classes should be prioritized in this test run, the resultsshowed that only seven classes were correctly classified. None of the instances

44

of flush, straight flush and royal straight flush classes were classified. Thereforewe ultimately tried a test with costs and winnowing, since we were not able toclassify any of the rarest classes anyway (although we could have spent moretime adjusting costs).

The final test results are shown in figure B.4 in section B.3. It shows thatthe costs-values improved the test results, now classifying only 9.9 percent ofthe instances incorrectly. The correctly classified instances were classified as sixclasses.

11.3 Comparison of techniquesIn this part we have tested four algorithms in WEKA - the random forestalgorithm, WEKA’s SMO, the naive bayesian algorithm and lastly the k-staralgorithm.

Of all the tests the naiveBayes- and SMO-algorithm performed poorest. Bothalgorithms didn’t manage to classify more than 50.0% of the instances. Theinstances was not classified into more than two classes in any of the tests withthese algorithms.

SMO was also a time consuming choice. In one case it took approximately17 hours to build a model with the algorithm - which couldn’t classify morethan 50% of the instances into one class.

The k-star algorithm performed better than naiveBayes and SMO. In thebest test the classifier had an error percentage of 43.61%, and was able topredict three classes.

The randomForest-algorithm performed best of our WEKA-tests in thischapter. In one case it was able to classify the instances into 5 classes. theerror percentage for this particular test was 27.19%. As the algorithms men-tioned above, the randomForest-algorithm also had problems with classifyingthe rarest classes.

But in comparison with our C5.0 test, in section 11.2, the randomForest al-gorithm performed poor. In the best test of C5.0 the algorithm was able to clas-sify 90.1% of the classes correctly, which is much better than the randomForest-algorithm performed in any of it’s tests (but one remark to these tests is that therandomForest algorithm used a different test set than C5.0, making comparisonharder).

If we at last compare all these tests with the ANN-tests performed in Matlabin project 2, none of them are nearly as good as the neural networks. Althoughit took more time creating neural networks than it took for any of the k-star-and randomForest-tests performed with WEKA, the neural networks were ableto classify a much higher percentage of poker hands. In one case the error per-centage was as low as 0.7% for a test set from the data set containing 1.025.010instances.

45

Chapter 12

Two more tasks

In this last chapter of the WEKA part we try to solve two more tasks, givenin the first and the second project. The first task is to check how sensible theWEKA techniques are to missing values, which was a task given for trees in thefirst project. The next is to discuss future improvements, which was a task inthe second project.

12.1 How sensible the techniques are to missingvalues

To perform this task we separated the data set into a training and a test set.The test set was the one with 7.503 instances. The training set was a set with17.507 instances and 11 attributes (for both). We didn’t use the larger setbecause the random forest algorithm couldn’s handle the a data set of this size- resulting in a memory error.

After splitting up the set we created a Java-program to replace values inan arff-file with question marks, which resembles missing values. We used thisprogram to remove values from the training set, containing one 17.507 instances.We chose to remove one, random value from 50% of all instances, removing atotal of 8.753 values. The program is presented in subsection B.2.1.

Then we performed tests of all the WEKA algorithms tested in this partto check how sensible they are to missing values. For each of the algorithmsone test was first performed with a training set with no missing values. Thena second test was performed with the set with missing values. Every algorithmwas tested with the test set. Table 12.1 shows the results of the tests (thecolumn names are self-explanatory).

All tests in the table was performed with basic option adjustments, andwithout cross validations. The following adjustments were used:

1. Random forest: nt = 100, nf = 10.

2. Naive Bayesian: ke = false.

3. K-star: gb = 50.

4. SMO: cc= 10, normalization enabled.

46

Table 12.1: WEKA - missing values

id algorithm missing values hh:mm:ss error percentage classes predicted1 naiveBayes 0 00:00:00 50.2 12 naiveBayes 8.753 00:00:05 50.2 13 smo 0 00:05:24 50.2 14 smo 8.753 00:04:42 50.2 15 kStar 0 00:00:00 44.5 36 kStar 8.753 00:00:00 45.1 37 randomForest 0 00:00:31 38.3 48 randomForest 8.753 00:01:01 40.8 4

A table with tests of a data set with missing values (the kStar-algorithm used moretime classifying the instances than stated. The given time is time consumption for

the building of the model).

In the tests shown in table 12.1 the algorithm naiveBayes and smo weren’tcapable of classifying more than 49.8% classes correctly. For both the test withmissing values, and the ones without, the algorithms classified all classes as oneclass. We were therefore not capable of discovering any significant disadvantagesof running tests with missing values for these particular algorithms (althoughthe naive Bayesian classifier used a lot more time creating the tree when valueswere missing).

The tests performed with the k-star and the random forest algorithms weredifferent. Both test performed slightly better without missing values. But thedifference was in both cases relatively small.

12.2 Future improvementsIn this chapter we have tried only a few of WEKA’s available functions. Wehave for example only tested a simple bayesian classifier with the NaiveBayes-algorithm. In the future we could also try WEKA’s BayesNet and NaiveBayes-Multinomial, and try to fit these to our needs. Other functions, in other pack-ages, could also have been tried out.

All algorithms could also have been tested more to achieve better classifi-cations. We could for example have tested with modified data sets, as we didin project 2, to check if this approach could approve the classifications. On theother side we could have trimmed the algorithms better, but this would havetaken to much time in this project.

We let all these suggestions instead be proposals for future improvements.

47

Part III

R

48

Chapter 13

R - Intro

We have decided to perform a simple test of machine learning (ML) algorithmsavailable in R. To make this possible we started out installing the R environment.Then we typed the following command in the R console:

> install.packages(’e1071’,dependencies=TRUE)

This command installs the e1071 package, which contains implementationsof ML algorithms.

To use the package, we had to write the following in the R console:

> library(class)> library(e1071)

To perform file handling we have followed the tutorial "Data Import" [62].To read a csv file into R as the following commands were used (The csv file wasgenerated by adding column headers to the .data file containing the poker handdata set):

> FILE <- "C:\\Users\\amund_000\\Desktop\\Maskinlæring\\Project3\\data\\r\\poker.csv"> myData = read.csv(FILE)

To check if the file was loaded we typed the variable name myData in the Rconsole. The following was outputted:

...9085 4 4 3 8 2 2 2 6 2 7 09086 3 1 3 5 3 7 1 1 1 6 19087 4 8 2 9 3 3 3 12 4 1 09088 1 7 4 5 4 6 1 11 1 2 09089 2 8 4 10 1 13 2 6 4 1 09090 1 1 1 12 2 8 2 13 3 3 0[ reached getOption("max.print") -- omitted 15920 rows ]

49

Chapter 14

R’s SVM

14.1 An initial testIn this chapter we have tested the support vector machine algorithm includedin the e1071 package (SVMs in general are described in subsection 9.1.1). Thisalgorithm is able to perform both regression and classification [17]. In our casewe wanted to perform a classification of poker hand instances.

To make a simple test of R’s SVM functionality we first followed a tutorialonline [38] - describing how to split the set into a test and a training set. Wealso let us inspire by a script in the R documentation, showing how to use aSVM to predict the outcome of other instances [36].

We ran into some minor problems when we were to build a SVM model.After a while we found out that the model was built as a regression type insteadof a classification type - by printing the summary of our SVM model. Thefollowing line was then written to build a simple test model (Type ’C’ standsfor classification):

model<-svm(hand ~ ., training, type="C")

The complete first version of the R script is presented in subsection C.1.1.By using this script we were able to print the confusion matrix for our test

set. Figure 14.1 presents this matrix.The matrix shows that the SVM classified the instances as either class 0 or

1 (nothing or one pair). A print of the prediction’s summary makes this evenclearer (figure 14.2).

In total there were 7.503 instances in the test set. The SVM were able toclassify 4.163 of the correctly - with an error percentage of approximately 44.5%.

50

Figure 14.1: Confusion matrix for the first test of R’s SVM.

Figure 14.2: The first SVM’s predictions.

14.2 Advanced testsIn our initial test of the e1071’s SVM in section 14.1 we didn’t adjust thealgorithm’s options when building the model. Instead we ran the algorithmwith the default options selected.

In this section we have first tested how manipulation of the cost variableaffects the SVM model. We have also set the number of cross validations toten for each of the test runs. This was done by changing the value of thecross-variable [62].

Then we have tested how scaling affects the SVM, and considered a kernelby testing different kernel types.

Finally we include the best results in a single test run. This approach, withtesting both kernel-type, scaling and cost, is proposed in the article "A PracticalGuide to Support Vector Classication" by Hsu, Chang and Lin [13].

During the tests we also wanted to time the building of the SVM models.For this purpose we used the time-function, as described by Joseph Adler [1].

14.2.1 Test of costThe following changes was made to the original script (the cost-variable waschanged in the different tests):

> Sys.time() ; model<-svm(hand ~ ., training, type="C", cost=1, cross=10) ; Sys.time()

Table 14.1 shows information about the models built, and predictions per-formed with the training and the test data sets

51

Table 14.1: SVM tests with cost

Index Cost Cross-validation Time Test err Train err Number of SVs1 1 10 00:11:48 44.7% 41.9% 15.9352 2 10 00:11:22 44.0% 39.6% 15.7973 4 10 00:13:19 43.9% 37.1% 15.6604 8 10 00:19:07 44.2% 34.8% 15.4675 16 10 00:28:14 44.8% 31.6% 15.2376 32 10 00:46:29 45.8% 27.7% 15.048

A table showing information of tests performed with the SVM included in R’s e1071package. In these tests the cost variable has been changed.

In table 14.1 we see that training error decreases when the cost variable isincreased. The same is not the case for the test error percentage. The SVMmodel which performed best with the training set, performed worse than all theothers with the test set. This may have been caused by over fitting.

On the other side the set with a cost value of 32 tried to classify more classesthan the others, as illustrated in figure 14.3 In comparison the SVM with costvalue set to 1 didn’t try to classify the instances into any other classes than 0and 1 (a confusion matrix for an SVM with cost = 1 is shown in figure 14.1,since the default cost value is 1).

Figure 14.3: Confusion matrices showing results for testing and training of anSVM with cost value set to 32.

52

14.2.2 Test of kernel typeIn this test we have tried different kernel types and compared the results in atable. The cost value is set to 1, and the cross-validation is set to ten in thesetests. Modifications to the R script is shown below (kernel is changed in everytest):

> Sys.time() ;model<-svm(hand ~ ., trainingset, type="C", kernel="linear", cross=10) ;Sys.time()

Since the previous test in subsection 14.2.1 showed that some SVMs wereclassifying more classes than others, we have added a column ’class pred’ to thetable in this subsection. The number of classes predicted by the different SVMsare written in this column. We have excluded the column cross-validations,since the value is 10 for all SVMs anyway.

Table 14.2: SVM tests with different kernels

Index Kernel Time Test err Train err Number of SVs Class pred1 linear 06:05 50.0% 50.0% 17.205 12 polynomial 06:58 50.0% 50.0% 17.121 13 radial 14:08 44.7% 41.9% 15.935 24 sigmoid 09:28 54.6% 57.4% 11.263 4

A table showing information of tests performed with the SVM included in R’s e1071package. In these tests the kernel has been changed.

The table 14.2 shows that the "radial basis" performed best in the testfor both sets, classifying the instances into two classes. The error rate whenclassifying the training instances was 41.9 percent, and 44.7 percent for the testset’s instances.

14.2.3 Test of scale

Table 14.3: SVM tests with scale on and off

Index scale Time Test err Train err Number of SVs Class pred1 TRUE 11:23 44.7% 41.9% 15.935 22 FALSE 24:10 42.1% 18.5% 16.392 5

A table showing information of tests performed with the SVM included in R’s e1071package. In these tests the scale value has been changed.

In this subsection we are examining how changing the scale may affect theSVM models. We start with setting the scale value to TRUE, which is thedefault value. Then we turn it off. The command below shows this first choice:

> Sys.time() ;model<-svm(hand ~ ., trainingset, type="C", scale=TRUE, cross=10) ;Sys.time()

53

The results of the tests are shown in table 14.3.

14.2.4 Combined testIn this section we have combined the options from each of the previous tests intoa final test with both kernel, scale and cost selections set. We have calculated theconfidence interval of the SVMs with the best test results, and chosen specificoptions on the basis of the calculations.

To calculate confidence intervals we have used the Java program we cre-ated in project 2 for this purpose. The program calculates confidence intervalswith 95% certainty, and trows an exception if the confidence interval can’t becalculated. It outputs the maximum and minimum border of the calculatedconfidence interval.

Confidence interval of scale tests

Figure 14.4: Confidence interval calculated on the basis of the best scale test.

On the basis of the results shown in figure 14.4, the test results are betterfor the test with scale set to ’FALSE’, than with scale set to ’TRUE’. 44.7% ismore than the maximum calculated confidence interval value. Therefore we willconduct the final test with scale set to ’FALSE’.

Confidence interval of kernel test

On the basis of the confidence interval calculated in figure 14.5, we have chosento use the "radial basis" kernel. None of the other test results was within theconfidence interval of the best test.

54

Figure 14.5: Confidence interval calculated on the basis of the best kernel test.

Confidence interval of cost test

Figure 14.6: Confidence interval calculated on the basis of the best cost test.

Five of the SVMs generated in the cost test have test results within thecalculated confidence interval. However, we have chosen to use the cost to theSVM with the best test results - cost = 4.

Final test

The following R command was used in the final test:

> Sys.time(); model<-svm(hand ~ ., trainingset, type="C", scale=FALSE,cost=4, kernel="radial", cross=10);Sys.time()

55

The final test was not able to classify as many instances as we had hoped.The model was made in 45 minutes and 11 seconds. The final error percentagefor the test set was 44.9 percent. The error percentage for the training set was1.5 percent. The commands written to perform the test, and the test output, ispresented in subsection C.1.3.

56

Chapter 15

R’s naiveBayes

15.1 About naiveBayesEarlier in this project naive bayesian classifiers has been described in section9.2. In this chapter we have tested the e1071 package’s implementation of thealgorithm.

The algorithm has more available options than WEKA’s implementation.[17]. The following parameters may/must be used with R’s "naiveBayes":

1. x: A matrix or data frame of variables [17].

2. y: A class vector [17].

3. formula: A formula of the form class x̃1 + x2 [17].

4. data: A table, data frame or predictors [17].

5. laplace: This options may toggle the laplace smoothing. The default is 0,which disables the smoothing algorithm [17]. Laplace smoothing preventszero probabilities and accounts for features not present in the learningsamples [33].

6. subset: An index vector with cases to be used in the training [17].

7. na.action: This is an option which may be used to handle missing values[17]. In our case it is not necessary to handle such values, since the dataset is complete.

8. object: This is an object of the class "naiveBayes" [17].

9. newdata: A data frame with new predictors, matched against the trainingdata [17].

10. type: If this option is set to "raw", the conditional a-posterior probabilitiesfor each class are returned [17].

11. threshold: This value may be used to replace cells with 0 probabilities[17].

57

15.2 An initial testIn this section we have created a simple test of the naiveBayes algorithm. How-ever, the algorithm will be thoroughly tested in the next section.

Before we conducted the test, we did some changes to the .csv-file used.Instead of letting R read a file with numeric classes, we added text to each classfor every instance in the file. The word "hand" was added. The reason whywe didn’t used numeric class values, was to make it easier for R to intuitivelyunderstand that our task is a classification, and not a regression task. The Javacode is presented in subsection C.2.1.

Figure 15.1 shows our initial test of the naiveBayes algorithm. The test wasperformed without any adjustments of options, and with a percentage split ofthe set. A simple approach was chosen, since this was an initial test.

One line is missing in the test, which was performed before the commandsshowed in the GUI:

> classifier<-naiveBayes(training[,1:10], training[,11])

Figure 15.1: The initial test of R’s naiveBayes.

15.3 Advanced testsIn these tests we have used the complete data set containing 1.025.010 instances.We have tested how the laplace value (described in section 15.1) and the thresh-old value affect the classifications.

The columns used in table 15.1 are as follows:

58

1. id: The tests’ id.

2. laplace: The laplace value used.

3. thresh: The threshold value used,

4. time: The total time consumption of the classifier generation.

5. predTime: Time consumption of the predictions.

6. err: The percentage error while classifying the test set.

7. cp: The number of classes perdicted in the test set.

The commands written to perform the tests was formed as the set of com-mands below:

%Creation of the classifier (Sys.timeU() is used for timing):> Sys.time() ;classifier<-naiveBayes(training[,-11], training[,11],laplace=0,threshold=0) ;Sys.time()

%A final classification of a test set:> Sys.time() ; table(predict(classifier, test[,-11]),test[,11]) ; Sys.time()

Table 15.1: naiveBayes tests

id laplace thresh time predTime err(%) cp1 0 0 00:00:07 00:03:04 50.0 12 1 0 00:00:06 00:02:47 50.0 13 2 0 00:00:09 00:02:41 50.0 14 4 0 00:00:12 00:02:37 50.0 15 8 0 00:00:06 00:02:38 50.0 16 0 0.001 00:00:06 00:02:40 50.0 17 1 0.001 00:00:07 00:02:38 50.0 18 2 0.001 00:00:06 00:02:40 50.0 19 4 0.001 00:00:07 00:02:37 50.0 110 8 0.001 00:00:17 00:02:38 50.0 111 0 0.01 00:00:07 00:02:45 50.0 112 1 0.01 00:00:07 00:02:38 50.0 113 2 0.01 00:00:06 00:02:44 50.0 114 4 0.01 00:00:13 00:02:38 50.0 115 8 0.01 00:00:07 00:02:37 50.0 116 0 0.1 00:00:07 00:02:44 50.0 117 1 0.1 00:00:07 00:02:42 50.0 118 2 0.1 00:00:06 00:02:44 50.0 119 4 0.1 00:00:07 00:02:41 50.0 120 8 0.1 00:00:07 00:02:48 50.0 1

A table showing information of tests performed with e1071’s naiveBayes algorithm.

59

All the instances was classified as class 0 in every test of the naiveBayes-algorithm. We didn’t expect this outcome. We hoped we could be able toclassify the instances better. Therefore we decided to test the algorithm further.

15.4 Tests of caret’s naive Bayesian classifierR’s classification and regression training library - ’caret’ - provides a version ofthe naive Bayesian algorithm. The library has a train function with ’method’as a parameter. This method may be set to ’nb’ - which is the library’s naiveBayesian classification option [32].

This library also provides cross-validation options. We tried e1071’s tune.controlin section 15.3, without any luck [37]. In this section we will try caret’s train-Control to test cross-validation on the data set [32].

The following commands are examples used to train the naive Bayesian clas-sifiers with the caret library (the whole data set, containing 1.025.010 instances,was used during the training):

%To use the caret library:> library(caret)

%Training of the naive Bayesian classifier with 10-fold cross-validation.> Sys.time() ; model<- train(myData[,-11],myData[,11],"nb", trControl=trainControl(method="cv", number=10)) ; Sys.time()

In this section the tests of the algorithm are only exploring the use cross-validation with a naive Bayesian classifier. The following abbreviations are usedin table 15.2:

1. id: The current classifier’s id.

2. cv: Number of cross-validation folds.

3. t: Time used to build the predictive model.

4. err: The error percentage.

5. cp: The number of classes predicted by the current classifier.

The classifiers were all trained with the complete data set, as mentionedbefore. Afterwards the set containing 25.010 instances was used for testing.

60

Table 15.2: Naive Bayesian algorithm with cross-validations

id cv t err (%) cp1 2 02:29:40 50.0 12 5 02:15:54 50.0 13 10 02:06:01 50.0 14 20 02:06:53 50.0 1

A table showing information of tests performed with caret’s naive Bayesianalgorithm, and cross-validation.

The results in table 15.2 show that all instances was classified as one class -class zero (nothing in hand). None of the cross-validation adjustments affectedthe result.

61

Chapter 16

R’s random forest

16.1 About the random forest algorithmLeo Breiman and Adele Cutler give a description of how the random forestalgorithm works, and how it is implemented in the ’randomForest’ package[12]

The algorithm creates a number of classification trees. Each of the treesclassifies objects, and then votes for finding the best classification. The algo-rithm ends up with the classificator having the most votes of all the trees in theforest [12].

When creating random forest trees there is, according to the authors, noneed of performing cross-validations. It is also unnecessary to divide the dataset into a test and a training set. This is estimated internally during a run [12].

All the trees in the random forest algorithm grow as long as they can get.There is therefore no pruning implemented in the algorithm [12].

16.2 Initial testsIn this section we have tested CRAN’s package ’randomForest’ [34]. The algo-rithm has been tested with the poker data set containing eleven attributes (tenattributes and one class) and with 86 attributes (85 attributes and one class).by doing this we wanted to find out if the binary data set could outperform theone with ordinary numbers.

The library was loaded in the following way:

> library(randomForest)

After loading the library, we tested the algorithm with the data set contain-ing eleven attributes and 1.025.010 instances. The data set was split into a testand a training set. The test set contained 30% of the total number of instancesin the data set.

The following code was written to perform a simple test with the data setcontaining eleven attributes:

> rf <- randomForest(hand ~ ., training, ntree=10)> table(predict(rf, test[,-11]),test[,11])

62

The test took less than a minute and created a tree able of classifying 68.7%of the instances correctly - into seven classes. The error percentage for the testperformed with eleven attributes was therefore 31.3%.

In the next test we tried the data set containing binary attributes. The dataset was converted to csv file format from arff, and we used the Java-program insubsection C.2.1 to modify the class attribute of the instances.

The following commands was written to make the binary data set ready touse in R (the last, two commands check if the size of the test and training setare correct):

> FILE3 <- "C:\\Users\\amund_000\\Desktop\\Maskinlæring\\Project3\\data\\r\\binary_complete2.csv"> pSet = read.csv(FILE3)> pIndex<-1:nrow(pSet)> pTestIndex<-sample(pIndex, trunc(length(pIndex)*30/100))> pTest <- pSet[pTestIndex,]> pTrain <- pSet[-pTestIndex,]> nrow(pTrain)[1] 717507> nrow(pTest)[1] 307503>

After splitting the set into a training and a test set we tested the random-Forest algorithm with the following commands:

> Sys.time() ;rf <- randomForest(pokerHand ~ ., pTrain, ntree=10) ;Sys.time()[1] "2014-04-02 10:27:02 CEST"Error: cannot allocate vector of size 232.7 Mb

But it turned out that the randomForest algorithm didn’t manage to createten trees for classifying the instances - the data set was too large. We repeatedthe test with two trees, but got the same error. We therefore decided to repeattest one and two, but with the data set containing 25.010 instances.

The following code was written for the smallest set:

> FILE3 <- "C:\\Users\\amund_000\\Desktop\\Maskinlæring\\Project3\\data\\r\\poker2.csv"> pSet = read.csv(FILE3)> pIndex<-1:nrow(pSet)> pTestIndex<-sample(pIndex, trunc(length(pIndex)*30/100))> pTrain <- pSet[-pTestIndex,]> pTest <- pSet[pTestIndex,]> nrow(pTrain)[1] 17507> nrow(pTest)> Sys.time() ; rf <- randomForest(hand ~ ., pTrain, ntree=10) ; Sys.time()[1] "2014-04-02 10:36:49 CEST"[1] "2014-04-02 10:36:49 CEST"> table(predict(rf, pTest[,-11]),pTest[,11])

63

The first test created a tree able of classifying the poker instances into fourclasses. The error percentage for the test set was 44.0%.

Then we repeated the process testing the randomForest algorithm with thebinary data set, but with 25.010 instances.

The algorithm used approximately five seconds to generate the tree. It wasable to classify the instances into four classes, but fewer instances was classifiedas hands containing other cards than one pair or nothing. The error percentagefor the test set was 40.6%.

To decide which method to use, we calculated the confidence interval for thebest run - to see if it was a coincidence that this run performed better than thefirst. This was done using the Java program created in project 2. Figure 16.1shows the upper and lower limit of the confidence interval.

Figure 16.1: Upper and lower limit for the confidence interval calculated withthe results of the second test run.

Since the upper limit was lower than the first test with 25.010 instances, wedecided to use the data set containing binary data in this chapter’s advancedtests. We didn’t want to use the data set with 1.025.010 instances, althoughthe results was better with this set without the use of binary data. The reasonfor this choice was the lack of a binary data test to compare the performed testwith.

16.3 Advanced testsSince the random forest algorithm doesn’t need cross-validation, as stated insection 16.1, we have not tested the algorithm with this kind of option. Althoughthe tree doesn’t need a test set to check the results of every run, we have decidedto split the data into a test and a training set. The reason is that this approachmakes it easier for us to control the performance of each run.

The algorithm has several adjustable options. Stephanie Shih at StanfordUniversity gives a description on how this kind algorithms should be tested,although she is describing another package - R’s party package with the cforest-algorithm [56].

Shih proposes that the algorithm’s parameters controlling number of treesand number of randomly preselected predictor variables (mtry) should be changedduring testing. According to Shih the mtry value should be set to the squareroot of the number of values in each instance [56].

In the randomForest algorithm, tested in this section, these parameters maybe controlled. ntree is here the variable controlling number of trees and mtryworks in the same way as for the Party package’s random forest algorithm [34].

We have therefore decided to test the algorithm by changing both options.The tests are shown in table 16.1. Here we have chosen to vary the mtry value

64

with the numbers 1, 5 and 9. 9 is the square root of the number of values inour instances. 5 is the default value in the Party package. 1 is a smaller value.

The number of trees is starting at 10 and is increased in the later tests.The following commands were written to test the algorithm (the mtry- and

ntree-values were changed in each run):

> Sys.time() ;rf <- randomForest(pokerHand ~ ., pTrain, ntree=500, mtry=5) ;Sys.time()> table(predict(rf, pTest[,-86]),pTest[,86])

The following column names are used in table 16.1:

• id: The id of the test.

• mtry: The mtry-value described earlier.

• ntree: The number of trees generated in each test run.

• t: Time used to create the random forest decision tree.

• err (%): The error percentage with the test set.

• cp: The number of classes classified in the current test.

Table 16.1: Tests of the random forest algorithm

id mtry ntree t err (%) cp1 1 10 00:00:02 49.9 22 5 10 00:00:03 45.1 33 9 10 00:00:04 41.6 34 1 50 00:00:07 49.9 15 5 50 00:00:15 38.7 26 9 50 00:00:17 34.4 27 1 100 00:00:14 49.9 18 5 100 00:00:30 36.6 29 9 100 00:00:32 32.5 210 1 500 00:01:08 49.9 111 5 500 00:02:41 35.6 212 9 500 00:02:50 29.7 2

A table showing information of tests performed with CRAN’s randomForest.

Table 16.1 shows that the test with 500 trees and mtry set to 9 performedbest, classifying more than 70% of the instances correctly. However, when wetried to test with 1000 trees, we got a memory error. Therefore we decided totest the random forest algorithm with even bigger numbers, but with the dataset containing ordinary digits.

With the ordinary data set we were able to perform a test with 1.000 trees,but got a memory error when we tried 1.500. The test classified 37.5 % wrong,but tried to classify the instances into five different classes.

All in all we got the best results with the binary set. But, as the last testwith ordinary numbers shows, the binary tests only managed to predict twoclasses, while the ordinary tests classified the instances into more classes.

65

Chapter 17

R’s neural networks

In this section we have decided to test a neural network in R. We have triedboth the algorithms implemented in R’s ’neuralnet’-package and the ones imple-mented in the ’RSNNS-package’. The ’nnet’-package was not considered, sincethe feed-forward net implemented in the package only permits one hidden layer.

The packages was installed and loaded with the following commands:

> install.packages(’neuralnet’)> library(neuralnet)

> install.packages("RSNNS")> library(RSNNS)

17.1 Simple tests of the ANNs in both packagesThe neuralnet-package has implemented different kinds of neural nets (ANN(s)).The ANNs may be trained with backpropagation, resilient backpropagation,without weight backtracking or with the modified globally convergent version[22].

Since we haven’t tried modified globally convergent version earlier, we de-cided to test this algorithm in this chapter.

The modified globally convergent algorithm (GRprop) is a variant of theresilient propagation-Rprop algorithm. This modification has, according to theauthors, improved learning speed compared to the original Rprop-algorithm.The algorithm uses unconstrained minimization theory to speed up the learning[4].

We started out testing the algorithm with the neuralnet package and thecomplete data set containing 1.025.010 instances and 86 attributes. Instead ofhaving ten binary output values, we tried with one even number output for eachof the instances.

Sadly the memory usage was too high for this algorithm to work with ourdata set. Therefore we reduced it to the one containing 25.010 instances.

After several tests we didn’t manage to get the output we wanted with theneural net included in the neuralnet-package, and experienced that the R-consolecrashed at several occasions. But figure 17.1 shows a plotting of a very simpleANN created with the data set. This particular ANN has only one hidden node.

66

Figure 17.1: A simple neural network created with the ANN included in R’sneuralnet-package.

We decided to try the RSNNS-package instead.’RSNNS’ is short for R’s Stuttgart Neural Net Simulator, which is a library

containing several neural net implementations in R. The package contains botha low-level and a high-level interface. The high-level interface is meant to use forthe most common neural net learning algorithms and network topologies [11].

To test the neural net package we used a high-level approach, the Multi-layerperceptron (mlp), described at Christoph Bergmeir’s website [10]. We startedout testing the data set containing 1.025.010 instances. But this set turned outto be too large, and we experienced the same memory errors as we did with theneuralnet-package’s ANN.

We decided to use the smaller set containing 25.010 instances instead, whichworked smoothly. The script used for testing is presented in subsection C.1.2.

During the initial test we experienced that the ANNgenerated with theRSNNS-package was capable of classifying some of our instances. We alsonoticed that the package had functionality to display the neural net and in-formation about the data classified in a sensible way.

With the function ’confusionMatrix’ we could display the confusion matrixfor the predictive model. The matrices for the training and test set was displayedin the following way:

> confusionMatrix(dataset$targetsTrain, fitted.values(model))predictions

targets 1 21 10253 3322 427 85753 6 10444 6 4475 73 46 42 37 0 328 0 6

67

9 3 010 4 1

> confusionMatrix(dataset$targetsTest, predictions)predictions

targets 1 21 1846 602 34 15243 0 1734 0 795 16 06 10 07 0 78 0 29 1 0

One advantage of the confusion matrix implementation in the RSNNS-packageis that the confusion matrices doesn’t display values if they don’t exist in a rowor column. For the test data the confusion matrix doesn’t display any instancesof class ten, since there weren’t any in the test set.

Several other displaying functions are also included in the package. As anecample the function ’plotIterativeError’ may be used to plot the error percent-age for each of a run’s iterations. A test of the functionality is shown in figure17.2.

Figure 17.2: A test of the RSNNS’ plotIterativeError-function.

17.2 Advanced testsIn the advanced tests we decided to test the mlp network more extensively.To do this we have performed several tests of the network with more than onehidden layer.

68

The following option was specified to create more than one hidden layer:

size=c(layerX, layerY, layerZ)

To be sure that the neural net could run for many iterations, we set thenumber of epochs to 500 with the option:

maxit=500

We have chosen to test different training functions; ’SCG’, ’Rprop’ and’Quickprop’. The following shows selection of the Rprop-algoritm:

learnFunc="Rprop"

The mlp was trained as follows (but with different number of hidden lay-ers/nodes and different training functions):

model <- mlp(dataset$inputsTrain, dataset$targetsTrain, size=c(85,65,45,25),maxit=500, initFunc="Randomize_Weights", learnFunc="SCG",inputsTest =dataset$inputsTest, targetsTest = dataset$targetsTest)

The following bullet list explains table 17.1:

• id: Test-id.

• function: Training function used.

• hidden: Number of hidden layers/nodes.

• t: Time used on the form "hh:mm:ss".

• err(%): Error percentage for test set.

• cp: Number of classes predicted.

Table 17.1: Tests of RSNNS

id function hidden t err(%) cp1 Rprop 85 00:32:09 18.5 52 Rprop 85,65 00:34:56 39.2 63 Rprop 85,65,45 00:49:26 43.4 64 SCG 85 00:21:29 7.5 25 SCG 85,65 00:36:59 7.4 26 SCG 85,65,45 00:28:40 51.0 17 Quickprop 85 00:22:35 6.7 48 Quickprop 85,65 00:37:30 7.9 39 Quickprop 85,65,45 00:51:06 7.3 2

A table showing information of tests performed with the RSNNS-package.

69

After completing the ANN tests we got mixed results. Some neural netsperformed good, classifying approximately 93-94% of the instances correctly.Others performed much poorer - with error percentages as high as 51%.

Surprisingly the deep neural networks performed in most cases the poorestwith error percentages of 51% and 43.4% with the SCG and the Rprop trainingfunctions.

But since the data set is much smaller than the one used in the MATLABtests, the poor results may be due to over fitting. In project one we read thatover fitting may be more common when classifiers are trained with small datasets. Therefore we ran a couple of tests with the Rprop-function again - the onewith one hidden layer and the one with three hidden layers. Then we wantedto find out if the test with three layers were able of classifying more instancesof the training set, compared to the first mentioned one. After that we wantedto check if the test results followed the same pattern as in table 17.1.

The first test with the Rprop training function and one hidden layer per-formed better than the one in table 17.1. Only 10.7% of the instances wereclassified incorrectly with the test set. 7.5% were classified incorrectly with thetraining set.

For the ANN with three hidden layers the error percentage for the test setwas 39.2% in the second test, a little better than in table 17.1. The trainingerror percentage was however much lower - 5.9%.

The results above therefore indicate a high degree of over fitting, especiallyfor the neural network with three hidden layers.

However, we didn’t get the same results every time we trained with simi-lar parameters, although we used the same data set. The reason may be therandomization of weights - which in most cases should leads to different resultseach time an algorithm is run.

70

Chapter 18

Comparison with Weka

In this project we have tested most of the functions in a similar way in anotherpart where we use the WEKA tool, instead of R. It may seem unnecessaryto compare these parts with each other, since the WEKA-functions should beaccessible in R through the ’RWeka’-package [24]. R is also a language, whileWeka is a tool written in a language - Java.

However, we haven’t tested this particular package in this project. Insteadwe have used the ’randomForest’-, the ’e1071’- and the ’caret’-packages, amongothers. This comparison will therefore be between the included packages in R,and the tested algorithms in WEKA. R may instead be viewed as a tool makingit possible to load packages.

18.1 Naive Bayesian classifiersAll naive Bayesian classifiers created in the project had an error percentage ofabout 50%. they were trained and tested with the complete data set containing1.025.010 instances.

In R we tried two packages’ implementations of the algorithm, but thatdidn’t improve our test scores.

However, while the R algorithms only were able of classifying the instancesinto one class, the WEKA implementation classified instances into more thanonly one class.

18.2 Random forest classifiersThe random forest algorithms were tested with the data set containing 25.010instances and 86 attributes. The best result was achieved in WEKA with 100constructed trees and number of features set to 85. This test had an errorpercentage of 27.19%.

In R the best test was trained with 500 trees and they mty-value set to 9.This test had an error percentage of 29.7%.

71

18.3 SVM classifiersIn WEKA we tested the SMO-algorithm, which didn’t perform better than anerror percentage of about 50%. At one occasion we tried to train the SMO with85 input attributes during 25 hours of model-building and prediction, but theresults were worse than a test with 10 attributes.

In R the svm-algorithm performed better than WEKA’s SMO. We achievedthe best results with a svm with the scaling option disabled. This test had anerror percentage of 42.1% and was could classify the instances into five classes.

We don’t know why the svm’s performed poorer than algorithms as C5.0,random forest and others. Onbe reason may be that the svm algorithm, whichuses hyper planes, supported by SVs, to classify instances isn’t designed to solveproblems as the poker problem. The hyper planes may be good at training setswhere thresholds easily may be set to predict classes, but not when classifyingcomplex combinations of poker cards.

72

Bibliography

[1] Joseph Adler. R in a nutshell. O’Reilly Media, Inc., 2010.

[2] Gennady Agre and Stanimir Peev. On supervised and unsuperviseddiscretization. http://www.cit.iit.bas.bg/CIT_02/v2-2/43-57.pdf,2002.

[3] S. Marsili Libelli & P. Alba. Adaptive mutation in genetic algorithms.http://www.dsi.unifi.it/~marsili/Papers/Mutation_GA.pdf, 2000.

[4] Aristoklis D. Anastasiadis, George D. Magoulas, and Michael M. Vrahatis.New globally convergent training scheme based on the resilient propagationalgorithm. http://www.dcs.bbk.ac.uk/~gmagoulas/New%20globally%20convergent%20training%20scheme.pdf, 2005.

[5] Konstantine Arkoudas and S. Bringsjord. Automatic programming. http://kryten.mm.rpi.edu/PRES/NACAP08/ka_sb_autprogatNACAP08.pdf.

[6] Dr. Alka Arora. Introduction to weka- a toolkit for machine learning. http://iasri.res.in/ebook/win_school_aa/notes/WEKA.pdf.

[7] AskDefine. Define hyperplane. http://hyperplane.askdefine.com/.

[8] Asa Ben-Hur and Jason Weston. A user’s guide to support vector machines.http://www.cs.colostate.edu/~asa/pdfs/howto.pdf, 2010.

[9] Michael J. Benton. Vertebrate Palaeontology. Blackwell Publishing com-pany, 2005.

[10] Cristoph Bergmeir and José M. Benítez. Examples of high-level api. http://dicits.ugr.es/software/RSNNS/index.php?view=Examples%20of%20high-level%20API.

[11] Cristoph Bergmeir and José M. Benítez. Package ’rsnns’. http://cran.r-project.org/web/packages/RSNNS/RSNNS.pdf, 2013.

[12] Leo Breiman and Adele Cutler. Random forests - classification descrip-tion. http://stat-www.berkeley.edu/users/breiman/RandomForests/cc_home.htm#intro.

[13] Chih-Chung Chang Chih-Wei Hsu and Chih-Jen Lin. A practical guideto support vector classication. http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf, 2010.

73

[14] Stephen Crosbie and David Corliss. Are you dense? using kernel densityestimation (kde) to connect the dots amidst uncertainty. http://www.mwsug.org/proceedings/2012/DV/MWSUG-2012-DV06.pdf, 2012.

[15] OpenCV dev team. Introduction to support vector machines.http://docs.opencv.org/doc/tutorials/ml/introduction_to_svm/introduction_to_svm.html, 2013.

[16] Grant Dick and Petyer Whigham. The behavior of genetic drift in aspatially-structured evolutionary algorithm. http://infosci.otago.ac.nz/assets/staff/pwhigham/publications/CEC2005.pdf.

[17] David Meyer et.al. Misc functions of the department of statistics (e1071),tu wien. http://cran.r-project.org/web/packages/e1071/e1071.pdf,2014.

[18] Remco R. Bouckaert et.al. Weka manual for version 3-6-10. file:///C:/Users/amund_000/Downloads/WekaManual-3-6-10.pdf, 2013.

[19] Just Winfried & Zhu Fang. Effects of genetic architecture on evolutionof multiple traits. http://www.ohio.edu/people/just/PAPERS/wjfzfin.pdf.

[20] Freddy Bugge Chrsitiansen & Tom Fenchel. Den Forudsigelige vilkårlighed.Aarhus University Press, 2009.

[21] Neural Net Forecasting. Svm support vectors. http://www.neural-forecasting.com/support_vector_machines.htm, 2005.

[22] Stefan Fritsch, Frauke Guenther, and Marc Suling. Package’neuralnet’. http://cran.r-project.org/web/packages/neuralnet/neuralnet.pdf, 2013.

[23] Karl Hahn. Lagrange multiplier method for finding optimimus. http://www.karlscalculus.org/pdf/lagrange.pdf, 2008.

[24] Kurt Hornik, Christian Buchta, Torsten Hothorn, Alexandros Karat-zoglou, David Meyer, and Achim Zeileis. Package ’rweka’. http://cran.r-project.org/web/packages/RWeka/RWeka.pdf, 2013.

[25] T. Kuo & S.Y. Hwang. A genetic algorithm with disruptive selection.http://www.ncbi.nlm.nih.gov/pubmed/18263031, 1996.

[26] National Instruments. Support vector machines. http://zone.ni.com/reference/en-XX/help/372916J-01/nivisionconcepts/supportvectormachines/, 2010.

[27] Aditya Wibowo & Peter Jamieson. Using simple ancestry to deter inbreed-ing for persistent genetic algorithm search. http://www.users.muohio.edu/jamiespa/html_papers/gem_12.pdf.

[28] Bartosz Jaworski, Lukasz Kuczkowski, Roman Smierzchalski, and PiotrKolendo. Extinction event concepts for the evolutionary algorithms. http://www.red.pe.org.pl/articles/2012/10b/65.pdf, 2012.

74

[29] George H. John and Pat Langley. Estimating continuous distribu-tions in bayesian classifiers. http://www.cs.iastate.edu/~jtian/cs573/Papers/John-UAI-95.pdf, 1995.

[30] Mohan P Arora & Chander Kania. Organic evolution. Global Media, 2009.

[31] Thiemo Krink and René Thomsen. Self-organized criticality and massextinction in evolutionary algorithms. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.391.3730&rep=rep1&type=pdf.

[32] Max Kuhn. Classification and regression training. http://cran.r-project.org/web/packages/caret/caret.pdf, 2014.

[33] Scikit Learn. 1.7 naive bayes. http://scikit-learn.org/stable/modules/naive_bayes.html, 2013.

[34] Andy Liaw. Package ’randomforest’. http://cran.r-project.org/web/packages/randomForest/randomForest.pdf, 2013.

[35] Edgar Galván López, Riccardo Poli, Ahmed Kattan, Michael O’Neil,and Anthony Brabazon. Neutrality in evolutionary algorithms... whatdo we know? http://cswww.essex.ac.uk/staff/poli/papers/GalvanLopezPoliKattanONeillBrabazon_Evolving_Systems_2011.pdf,2011.

[36] David Meyer. Predict method for support vector machines. http://ugrad.stat.ubc.ca/R/library/e1071/html/predict.svm.html.

[37] David Meyer. tune.control. http://www.inside-r.org/packages/cran/e1071/docs/tune.control.

[38] Author missing. Svm implementation step by step with r:ice-creamsales prediction. http://sivaanalytics.wordpress.com/2013/06/16/svm-implementation-step-by-step-with-r-ice-cream-sales-prediction/,2013.

[39] T.M. Mitchell. Machine Learning. McGraw Hill Education (India), 1997.

[40] IBM SPSS Modeler. Oracle svm expert options.http://pic.dhe.ibm.com/infocenter/spssmodl/v15r0m0/index.jsp?topic=2012.

[41] M. Negnevitsky. Artificial Intelligence - a Guide to Intelligent Systems.Pearson Education Limited, 2011.

[42] Dirk Arnold Nikolaus Hansen and Anne Auger. Evolution strategies.https://www.lri.fr/~hansen/es-overview-2014.pdf, 2013.

[43] University of California Museum of Paleontology. What comes aftermass extinctions? http://evolution.berkeley.edu/evolibrary/news/120901_afterextinction, 2012.

[44] OECD Glossary of Statistical Terms. Stochastic. https://stats.oecd.org/glossary/detail.asp?ID=3848, 2002.

75

[45] Roland Olsson and Brock Wilcox. Self-improvement for the adateautomatic programming system. http://www-ia.hiof.no/~rolando/sig-adate.pdf.

[46] Dipak V. Patil and R. S. Bichkar. Issues in optimizing of decision treelearning: A survey. http://research.ijais.org/volume3/number5/ijais12-450512.pdf, 2012.

[47] Pentaho. Classifiers - introduction. http://wiki.pentaho.com/display/DATAMINING/Classifiers.

[48] John C. Platt. Fast training of support vector machines using sequen-tial minimal optimization. http://research.microsoft.com/en-us/um/people/jplatt/smo-book.pdf, 2000.

[49] David Poole and Alan Mackworth. 6.1.3.2 baye’s rule. http://artint.info/html/ArtInt_144.html, 2010.

[50] David Poole and Alan Mackworth. 7.3.3 bayesian classifiers. http://artint.info/html/ArtInt_181.html, 2010.

[51] L Magnus Rattray. The dynamics of a genetic algorithm under stabilizingselection. https://www.complex-systems.com/pdf/09-3-3.pdf, 1995.

[52] UCI Machine Learning Repository. Poker hand data set. http://archive.ics.uci.edu/ml/datasets/Poker+Hand, 2007.

[53] Rulequest Research. Data mining with cubist. https://www.rulequest.com/cubist-info.html, 2012.

[54] Rulequest Research. C5.0: An informal tutorial. http://www.rulequest.com/see5-unix.html#SUMMARY, 2013.

[55] C. Rich and R. C. Waters. Approaches to automatic programming. http://www.merl.com/publications/docs/TR92-04.pdf, 1992.

[56] Stephanie Shih. Random forests for classification trees and cat-egorical dependent variables: an informal quick start r guide.http://www.stanford.edu/ stephsus/R-randomforest-guide.pdf, 2011.

[57] SourceForge. Class kstar. http://weka.sourceforge.net/doc.dev/weka/classifiers/lazy/KStar.html.

[58] SourceForge. Class naivebayes. http://weka.sourceforge.net/doc.dev/weka/classifiers/bayes/NaiveBayes.html.

[59] SourceForge. Class smo. http://weka.sourceforge.net/doc.dev/weka/classifiers/functions/SMO.html.

[60] Geir Vattekar. Adate user manual. http://www-ia.hiof.no/~rolando/ML/ADATE/AdateManual.pdf, 2006.

[61] Zan Wang, Jin-Ian Liu, and Xue Yu. Self-fertilization based genetic algo-rithm for university timetabling problem. http://dl.acm.org/citation.cfm?id=1543993, 2009.

76

[62] Chi Yau. Data import. http://www.r-tutor.com/r-introduction/data-frame/data-import, 2014.

[63] Chen Ying-Ping. Linkage learning genetic algorithm. http://link.springer.com/chapter/10.1007%2F11339380_4, 2006.

77

Appendix A

Automatic programming

A.1 ADATE file outputs

A.1.1 Output of individual in the .log-file----------------------------------------------------------------------Individual id = 0_32D867 Trace info = embedding for 0_1F03C7 0Individual fp = ~0.491199440649

Ancestor fps = [ ~0.491199440649, ~0.257919360247, 0.0399303737852, 0.535145037339,~0.0417986583654, ~0.0221375496725, 0.248381233341, 0.466126339023, 0.466126339023,0.258061258919, 0.00281294591169, 0.039110207166, ~0.00657606774544, 0.398032476397,0.523457120119, 0.523457120119, 5.16632293668E~4 ]

Max cost limit chosen = 0

Max cost limit done = 03072 0 3768 13063366 723545 [ 249.201223776, 246.738259839, 263.766710701 ]15D8A11E7FF2EF7 ~0.491199440649 2

Time limit = 65536Test eval value =2338 0 3595 0 536248 [ 249.201223776, 246.738259839, 263.766710701 ]19B569B3B8B1CE5 ~0.491199440649 2

3072 trainAndTest 2338 249.201223776 ~0.491199440649

fun f Xs =case Xs of

nill => (raise NA_A05C3)| cons( VA05C4 as card( VA05C5, VA05C6 ), VA05C7 ) =>case VA05C7 of

nill => flush| cons( VA05C8 as card( VA05C9, VA05CA ), VA05CB ) =>case ( VA05C9 = VA05C5 ) of

false => (

78

caselet

fun g1028218( V37AE09C, V1028219 ) =case V1028219 of

ace => VA05C6| s( V102820F ) =>case g1028218( V37AE09C, V102820F ) of

ace => VA05CA| s( V1024386 ) => V1024386

ing1028218( false, VA05CA )

end oface => three

| s( V18E5B0F ) =>case VA05C6 of

ace =>let

fun g93ED587 V93ED588 =case V93ED588 of

nill => nothing| cons(

V93EA3F5 as card( V93EA3F6, V93EA3F7 ),V93EA3F8) =>

case V93EA3F7 oface => twopairs

| s( V93EDAF5 ) => g93ED587( V93EA3F8 )in

g93ED587( VA05CB )end

| s( V6E6C8F3 ) =>case VA05CB of

nill => straight| cons( V93EA3F9 as card( V93EA3FA, V93EA3FB ), V93EA3FC ) =>

f(cons(

VA05C8,cons( card( VA05C5, V93EA3FB ), V93EA3FC ))

))

| true => f( cons( VA05C4, VA05CB ) )

Local trf history =

CASE-DIST [ 2, 2, 1, 2, 1 ] [ 2, 2, 1, 2, 2 ] [ 2, 2, 1, 2 ]ABSTR [ 2, 2, 1, 2, 1 ]g93ED587 : V93ED588rec_arg_type_existsR

79

Top poses = [ [ 2, 2, 1, 2, 1, 0, 2 ] ]Bottom poses = [ ]Bottom labels =Synted exp = case V93EA3F7 of ace => twopairs | s( V93EDAF5 ) => V93EDA6FNot activated symbols = [ ]

................

Global trf history =

Creation time = 665305.534982++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

A.1.2 pe1g0s0-grid from the trace-fileGrid for pe1g0s0:

Column for time limit 655360 0 0 9223372036854775806 27360 [ 5, 5, 4.975 ] A09D39708D7132 5.16632293668E~4 11001 0 5839 9223372036854775806 27360 [ 5, 5, 4.975 ] 3057E65E60DD215 0.47865127876 11001 0 5839 15780029 34200 [ 12.8244284354, 12.8244284354, 14.9358778578 ] 3057E65E60DD215 ~0.0111056253496 11041 0 5799 15798602 47880 [ 26.0059782249, 26.5101048009, 34.867449476 ] 65411D4231B605D 0.191921028217 11047 0 5793 15800154 47880 [ 28.7011236433, 25.8833221245, 34.8705833894 ] 576DF2F828B2AEB 0.216243976923 11209 0 5631 15842155 178484 [ 34.9862942793, 35.7165556784, 39.8214172216 ] 66784E2D9D070B 0.197467648593 11824 0 5016 15359333 61560 [ 43.8188093177, 45.0791257577, 59.7746043712 ] 3A8EE2A089D38DD ~0.0794469518104 11999 0 4841 13615437 97764 [ 53.121053467, 54.6075047301, 64.7269624763 ] 5E2EC25FAD63BA9 0.0351944830915 11999 0 4813 13610870 109400 [ 64.1502762894, 65.8887908405, 94.6705560458 ] 60898CD499E81E8 ~0.00956530981444 12061 0 4779 13619943 103596 [ 66.8347744637, 68.5732890148, 79.6571335549 ] 402F5EF8F196B1D 0.363893149645 12082 0 4758 14410251 148091 [ 70.7302683641, 72.7208462032, 124.636395769 ] 2C8173E8E7C43F8 0.293919561224 12093 0 4747 14394786 136338 [ 75.9785892504, 77.9432386245, 84.6102838069 ] 486BE3576C90975 ~0.388270632491 12102 0 4738 13613529 250732 [ 77.9198773111, 79.6843203271, 119.601578398 ] 235470EC651A1A3 ~0.00347733743652 12119 0 4721 13623498 109015 [ 80.6639726777, 82.6545505168, 94.5867272474 ] 181B74E2BF59B8E ~0.403571297671 12143 0 4697 13621658 257860 [ 81.9001933655, 83.8907712046, 124.580546144 ] 313097550C5FC0 ~0.599985040958 12143 0 4697 13615668 161625 [ 90.0522729357, 92.2949140628, 139.53852543 ] 9D627BAE7C79D1 ~0.277691166651 12169 0 4671 13596858 294912 [ 91.1875801677, 93.6822845828, 134.531689077 ] 410EEADFD7196E 0.513780148476 12198 0 4642 13610027 234242 [ 97.8543767383, 100.297224223, 114.498513879 ] 179CB83EA007471 0.286964403706 22243 0 4597 13617007 277953 [ 99.8778340211, 102.146403613, 149.489267982 ] 5E734591F4B9D3 ~0.199183570052 12347 0 4493 13598258 757128 [ 103.325086357, 106.446338311, 119.467969308 ] 511838E88272ADB ~0.426876905013 12374 0 4466 13567139 262784 [ 113.92354651, 116.670314214, 129.416648429 ] 1C25390DA85691D 0.407983118293 12543 0 4297 13475631 761312 [ 117.154284571, 120.527599813, 134.397563001 ] 20F6939623FB351 0.36362793558 12659 0 4181 14757860 818526 [ 123.209823816, 126.208654808, 144.368956726 ] 5ED9AC257F32F08 0.609259541693 12659 0 4181 13462014 848299 [ 131.601616718, 131.530582902, 144.342347085 ] 5ED9AC257F32F08 ~0.250690071587 12659 0 4181 13458885 607812 [ 132.55777932, 135.782745134, 149.321086274 ] 5ED9AC257F32F08 ~0.0828626556839 12892 0 3948 13271075 827986 [ 133.935554831, 136.960314287, 184.315198429 ] 5358B8770B39800 ~0.0989325291018 12960 0 3880 13204126 827986 [ 139.579411021, 142.604170477, 184.286979148 ] 56A1DB0CBD0925 ~0.322270524849 13715 0 3125 13031338 2185608 [ 140.206608768, 143.153582829, 184.284232086 ] 1EEFAA55CAC6FC2 ~0.318469205116 14004 0 2009 11176244 2483938 [ 145.850464958, 148.797439019, 184.256012805 ] 3F57110AB2F1AE2 ~0.0270900952408 1

80

4716 0 2124 10120965 2484168 [ 151.494321148, 154.441295209, 184.227793524 ] 1449C5EE68283F6 ~0.311437169739 14831 0 2009 9943286 2484168 [ 157.138177338, 160.085151399, 184.199574243 ] 3248A368BF27F1 ~0.0253650554096 14831 0 2009 9855203 2766961 [ 171.288415561, 174.48745291, 199.127562735 ] 4CC0C93C6044E72 ~0.0640020823971 14831 0 2009 9834539 2518368 [ 172.570621957, 173.673866034, 199.13163067 ] 3248A368BF27F1 ~0.0639346138437 14831 0 2009 9392119 2543024 [ 173.174200136, 174.277444213, 199.128612779 ] 4CC0C93C6044E72 ~0.0724684162812 14831 0 2009 9192866 2572375 [ 177.56955369, 178.89893259, 204.105505337 ] 4CC0C93C6044E72 ~0.444583136625 14831 0 2009 9180466 2825817 [ 187.324438359, 188.679745725, 214.056601271 ] 4CC0C93C6044E72 ~0.00870365864295 14831 0 2009 9115108 5541136 [ 190.652100491, 192.76359772, 219.036383011 ] 4CC0C93C6044E72 ~0.124515742688 14877 0 1963 9792567 2511528 [ 193.197278518, 187.438784734, 214.062806076 ] 53169616DCD917A 0.0194207908286 14982 0 1858 8802509 2355631 [ 199.605202781, 203.508573064, 223.982457135 ] 28A1B9F377178F 0.276305736073 15481 0 1244 7081812 2642686 [ 204.332161876, 208.539452377, 238.957302738 ] 150A50DD028BE5E ~0.0484456465223 15481 0 1244 7077619 2169535 [ 207.553122056, 211.986547381, 243.940067263 ] 150A50DD028BE5E 0.205060187522 15596 0 1244 6862486 2642801 [ 209.976018066, 214.183308567, 238.929083457 ] 155641763FB7B02 ~0.309907550045 15714 0 1011 5546020 2559278 [ 217.503372749, 221.484528427, 263.892577358 ] 553AC3BEC26E0E2 ~0.228128314147 15714 0 1011 5483318 2579379 [ 221.898726303, 226.106016804, 268.869469916 ] 553AC3BEC26E0E2 0.732337241613 15829 0 1011 5317887 2559524 [ 228.847668657, 232.828824335, 263.835855878 ] 53C69B2FA0A9277 ~0.661056070967 15829 0 1011 5255236 1958609 [ 232.68798147, 236.895271971, 268.81552364 ] 1AE8A496D06309 ~0.471256748758 15829 0 1011 5254484 2579589 [ 233.243022211, 237.450312712, 268.812748436 ] 1AE8A496D06309 ~0.471723509354 15829 0 1011 4809329 2587877 [ 243.559253909, 247.792472875, 278.761037636 ] 1AE8A496D06309 0.346746754515 15829 0 1011 4640489 2623922 [ 249.399763605, 253.859117394, 283.730704413 ] 1AE8A496D06309 ~0.356680303092 15845 0 995 4897771 2129609 [ 252.856581642, 247.854277723, 268.760728611 ] 96C504B09E098E ~0.225977025609 15845 0 995 4897005 2579589 [ 253.026332227, 248.024028308, 268.759879858 ] 96C504B09E098E ~0.237506460408 15886 0 954 4813463 2575639 [ 254.295537443, 258.52875641, 278.707356218 ] E5B462131DB13C ~0.446732941118 16017 0 708 4026395 2590710 [ 255.622265846, 250.872025215, 283.745639874 ] 572D2BD1DE73B14 ~0.00814140481526 16131 0 594 3600790 2160096 [ 259.82194035, 255.323763006, 288.723381185 ] 3E58C9BE35827FC 0.2362855426 16131 0 594 3600269 2682358 [ 261.498501761, 256.496197842, 288.717519011 ] 3E58C9BE35827FC 0.554107953228 16132 0 708 3795248 2590920 [ 266.966561754, 262.216321122, 283.688918394 ] 164057FC2FABE3B 0.56692200103 16246 0 594 3368722 2621523 [ 271.335986843, 266.837809499, 288.665810953 ] F89BC3F96A1DA9 ~0.580676273738 16246 0 594 3366018 2594689 [ 280.90679128, 276.408613937, 298.61795693 ] F89BC3F96A1DA9 ~0.0861201083238 16247 0 593 3373078 2647476 [ 283.976656087, 276.408613937, 298.61795693 ] 13EFBDAB10D0946 ~0.086106018361 16257 0 583 3067263 2670069 [ 284.409156202, 279.406852283, 288.602965739 ] 502538F25864048 ~0.353309315778 16271 0 569 3279039 2633543 [ 285.383131573, 281.137017518, 303.594314912 ] 9F708B4D26F306 0.0271253192127 16280 0 560 3012952 2696181 [ 287.745741525, 282.995500894, 303.585022496 ] 30987DC85B5539B 0.0439648984997 16282 0 558 3132647 2767212 [ 296.744046364, 292.24586902, 318.538770655 ] 261211C215BF02D ~0.559243620169 16301 0 539 2914611 2722769 [ 299.189003464, 294.438762833, 303.527806186 ] 2672E105BB4F236 ~0.482701495133 16309 0 531 2915709 2697962 [ 301.792886255, 297.294708912, 318.513526455 ] 28A091D2E875C9 0.255656834422 16317 0 523 2930038 2764021 [ 306.098688224, 301.826645704, 323.490866771 ] 5419AC62BD89C68 ~0.237835180749 16321 0 519 2907577 5351308 [ 310.3013967, 306.255489003, 328.468722555 ] 354512A8AA57DBB 0.00394527224405 16323 0 517 2872049 2708423 [ 316.046217972, 311.800103916, 333.44099948 ] 2E6B659B14543F9 0.00467049415012 16324 0 516 2907489 2863816 [ 320.348462121, 316.328482889, 338.418357586 ] 1959D42938EA71F 0.0178149186435 16324 0 516 2850277 2772016 [ 330.370896184, 326.855043528, 348.365724782 ] 368D741ECD21D4F ~0.353145930954 16327 0 513 2852514 2733028 [ 333.50787568, 326.443960106, 348.367780199 ] 17BDC57C0C66E70 ~0.0254716900504 16327 0 513 2824900 2786925 [ 341.820450395, 338.078462916, 363.309607685 ] 123A3FCF908DBB4 0.130757292844 16331 0 509 2811113 2812428 [ 342.415890914, 338.673903435, 363.306630483 ] 4E85C8BABD037A1 0.139172584735 16334 0 506 2795295 2802545 [ 347.995241152, 344.253253672, 363.278733732 ] 2BC1739A1245D66 ~0.617809656063 16339 0 501 2815299 2816139 [ 357.130257305, 353.640333114, 378.231798334 ] 48BE505EDA74678 0.370529280137 16339 0 501 2781175 3077931 [ 366.853371123, 363.589581755, 383.182052091 ] 48BE505EDA74678 ~0.2620503776 16341 0 499 2845784 2834346 [ 372.569568735, 369.331707832, 393.153341461 ] 38868273962DB85 ~0.302903099465 16344 0 496 2842816 5161285 [ 374.414103052, 369.993241321, 393.150134293 ] 48FCD6713932D7B 0.254923627328 16372 0 468 2745205 7327705 [ 386.359264179, 375.798672834, 408.121107136 ] 2CDF96B8D6FC331 0.276393757936 1

81

6372 0 468 2744922 5207503 [ 397.554518352, 390.541989925, 428.04739055 ] 2CDF96B8D6FC331 0.589010086932 16372 0 468 2718762 4476640 [ 403.369358512, 389.990965649, 423.050145672 ] 2CDF96B8D6FC331 9.2764028921E~4 16378 0 462 2793290 7265981 [ 404.095626349, 390.717233485, 423.046514333 ] 3DCE036AC7B2920 0.379612179299 16423 0 417 2475631 3827181 [ 411.952191167, 399.730870666, 428.001345647 ] 43506EB3B9D0B0C ~0.00996617868652 16423 0 417 2474945 4511877 [ 412.115534519, 399.894214018, 428.00052893 ] 43506EB3B9D0B0C 0.0318069361826 16423 0 417 2473296 4434124 [ 413.80152276, 402.336392123, 432.988318039 ] 5C4181CD875707 0.0766992546122 16423 0 417 2458562 4454227 [ 418.423970595, 407.184974781, 437.964075126 ] 50BED0BFF5D7893 0.442057844349 16449 0 391 2358119 4463096 [ 424.619257535, 411.310396915, 447.943448015 ] 2274DCECE9F16AD 0.242444670094 16458 0 382 2307953 3441515 [ 430.74550658, 419.784502519, 462.901077487 ] 3B66DDD3026DAC2 0.114263364202 16462 0 378 2324615 4480102 [ 450.246715154, 432.641854904, 467.836790725 ] 426A4BD9A166535 ~0.41273067117 16462 0 378 2324511 4509655 [ 452.564135198, 435.715464811, 472.821422676 ] 426A4BD9A166535 ~0.259724300051 16471 0 369 2282414 4084168 [ 458.05934644, 444.532604148, 472.777336979 ] 41D43F47388F1F2 ~0.042113071556 16478 0 362 2246345 4488770 [ 465.8578742, 449.261267101, 487.753693664 ] 52042FFB6A6F2AE 0.204577888186 16479 0 361 2231707 4105332 [ 476.945561015, 463.114898506, 487.684425507 ] 256D5D4F759CE0A ~0.0354306768997 16482 0 358 2240496 4617310 [ 478.626495963, 459.464150634, 497.702679247 ] 5CA48C4D216AB4B 0.534037006345 16487 0 353 2198224 4528233 [ 483.510342949, 464.095934332, 502.679520328 ] 5A5463ECE81AC02 0.550874035297 16487 0 353 2194933 3965260 [ 487.57845442, 474.555838705, 502.627220806 ] 272826D1BF73F78 0.546991504302 16487 0 353 2194835 4050293 [ 491.845148557, 478.266549336, 502.608667253 ] 272826D1BF73F78 ~0.0443534152821 16528 0 312 1990878 3908677 [ 493.243935535, 480.421526177, 507.597892369 ] 556431C48D9E0D ~0.259764905129 16548 0 292 1893799 4826713 [ 497.945772783, 484.845371672, 512.575773142 ] 56BEB2F0B438C41 0.459968593741 16551 0 289 1864944 4801581 [ 512.60060009, 499.752262268, 527.501238689 ] 38A2DA015E0E198 0.0429202627617 16566 0 274 1812844 4949233 [ 517.807512709, 505.61165089, 532.472142746 ] 6697A6C9BA1C641 0.127180353874 16566 0 274 1811415 4954258 [ 534.471687614, 520.432095812, 547.398040521 ] 6697A6C9BA1C641 ~0.0693777088382 16591 0 249 1661082 5555158 [ 553.170702736, 535.861039769, 562.320694801 ] 11E3AD712DF8044 0.272695683839 16592 0 248 1655034 5560992 [ 573.808022472, 552.984478204, 582.235077609 ] 23C88ACA4346337 0.0183619111125 1

A.1.3 Program with the best validation resultsfun f Xs =

case Xs ofnill => straight

| cons( VA05C4 as card( VA05C5, VA05C6 ), VA05C7 ) =>let

fun g2D072C9 V2D072CA =case VA05C7 of

nill => f( VA05C7 )| cons( VF4AF2D2 as card( VF4AF2D3, VF4AF2D4 ), VF4AF2D5 ) =>case V2D072CA of

nill => f( VA05C7 )| cons( V131B044F as card( V131B0450, V131B0451 ), V131B0452 ) =>case

(( case ( V131B0450 = VA05C5 ) of

false => V131B0450| true => VF4AF2D3 ) =VA05C5) of

false => (case g2D072C9( V131B0452 ) of

82

V131B0453 =>case V131B0453 of

V131B0454 =>case

letfun g131B0455 V131B0456 =

case V131B0456 oface => VA05C6

| s( V131B0457 ) =>case g131B0455( V131B0457 ) of

ace => s( V131B0451 )| s( V131B0458 ) => V131B0458

ing131B0455( V131B0451 )

end oface => (

case V131B0454 ofnothing => onepair

| onepair => twopairs| twopairs => three| three => house| straight => onepair| flush => onepair| house => four| four => V131B0454| straightflush => (raise NA_131B0459)| royalflush => (raise NA_131B045A))

| s( V131B045B ) =>case

letfun g131B045C V131B045D =

case V131B045D oface => V131B0451

| s( V131B045E ) =>case g131B045C( V131B045E ) of

ace => V131B045B| s( V131B045F ) => V131B045F

ing131B045C( g131B045C( g131B045C( s( VA05C6 ) ) ) )

end oface => (

case V131B0454 ofnothing => V131B0454

| onepair => onepair| twopairs => twopairs| three => V131B0454| straight => V131B0453| flush => straight| house => V131B0454

83

| four => V131B0454| straightflush => (raise NA_131B0460)| royalflush => (raise NA_131B0461))

| s( V131B0462 ) =>case V131B0454 of

nothing => nothing| onepair => V131B0454| twopairs => V131B0454| three => three| straight => (

case V131B0462 oface => straight

| s( V131B0463 ) =>case V131B0463 of

ace => V131B0454| s( V131B0464 ) =>case V131B0464 of

ace => (case VF4AF2D5 of

nill => straight| cons(

V131B0465 as card( V131B0466, V131B0467 ),V131B0468) =>f( cons( VF4AF2D2, nill ) )

)| s( V131B0469 ) => nothing)

| flush => straight| house => V131B0454| four => V131B0454| straightflush => (raise NA_131B046A)| royalflush => (raise NA_131B046B))

| true =>case VF4AF2D5 of

nill => flush| cons( V131B046C as card( V131B046D, V131B046E ), V131B046F ) =>

g2D072C9( V131B0452 )in

g2D072C9( VA05C7 )end

A.2 ADATE screen dumps

84

Figure A.1: Generation of a .spec-file from a C5.0 .names- and .data-file.

Figure A.2: ADATE-ML code in the generated .spec-file.

Figure A.3: Creation of a poker.spec.sml-file.

85

Appendix B

WEKA

B.1 WEKA text output

B.1.1 The first initial test of the SMO algorithm

=== Stratified cross-validation ====== Summary ===

Correctly Classified Instances 11581 46.3055 %Incorrectly Classified Instances 13429 53.6945 %Kappa statistic 0.0001Mean absolute error 0.163Root mean squared error 0.2775Relative absolute error 143.3848 %Root relative squared error 116.4067 %Total Number of Instances 25010

=== Detailed Accuracy By Class ===

TP Rate FP Rate Precision Recall F-Measure ROC Area Class0.518 0.517 0.5 0.518 0.509 0.501 00.482 0.483 0.423 0.482 0.451 0.5 10 0 0 0 0 0.498 20 0 0 0 0 0.496 30 0 0 0 0 0.547 40 0 0 0 0 0.688 50 0 0 0 0 0.564 60 0 0 0 0 0.785 70 0 0 0 0 0.576 80 0 0 0 0 0.711 9

Weighted Avg. 0.463 0.463 0.429 0.463 0.445 0.501

=== Confusion Matrix ===

a b c d e f g h i j <-- classified as

86

6471 6022 0 0 0 0 0 0 0 0 | a = 05489 5110 0 0 0 0 0 0 0 0 | b = 1592 614 0 0 0 0 0 0 0 0 | c = 2275 238 0 0 0 0 0 0 0 0 | d = 348 45 0 0 0 0 0 0 0 0 | e = 437 17 0 0 0 0 0 0 0 0 | f = 522 14 0 0 0 0 0 0 0 0 | g = 62 4 0 0 0 0 0 0 0 0 | h = 73 2 0 0 0 0 0 0 0 0 | i = 82 3 0 0 0 0 0 0 0 0 | j = 9

B.1.2 The second initial test of the SMO algorithm=== Stratified cross-validation ====== Summary ===

Correctly Classified Instances 11546 46.1655 %Incorrectly Classified Instances 13464 53.8345 %Kappa statistic -0.004Mean absolute error 0.163Root mean squared error 0.2775Relative absolute error 143.3864 %Root relative squared error 116.4121 %Total Number of Instances 25010



Weighted Avg. 0.462 0.466 0.427 0.462 0.444 0.5


a b c d e f g h i j <-- classified as6630 5857 0 0 0 0 0 3 3 0 | a = 05673 4916 0 0 0 0 0 4 3 3 | b = 1657 546 0 0 0 0 0 1 0 2 | c = 2266 247 0 0 0 0 0 0 0 0 | d = 3

87

53 39 0 0 0 0 0 0 0 1 | e = 432 22 0 0 0 0 0 0 0 0 | f = 518 17 0 0 0 0 0 1 0 0 | g = 65 1 0 0 0 0 0 0 0 0 | h = 72 3 0 0 0 0 0 0 0 0 | i = 83 2 0 0 0 0 0 0 0 0 | j = 9

B.1.3 The first test of the naiveBayes algorithm

=== Stratified cross-validation ====== Summary ===

Correctly Classified Instances 12493 49.952 %Incorrectly Classified Instances 12517 50.048 %Kappa statistic 0Mean absolute error 0.1136Root mean squared error 0.2384Relative absolute error 99.9801 %Root relative squared error 100.0106 %Total Number of Instances 25010


TP Rate FP Rate Precision Recall F-Measure ROC Area Class1 1 0.5 1 0.666 0.51 00 0 0 0 0 0.496 10 0 0 0 0 0.512 20 0 0 0 0 0.5 30 0 0 0 0 0.644 40 0 0 0 0 0.484 50 0 0 0 0 0.528 60 0 0 0 0 0.591 70 0 0 0 0 0.464 80 0 0 0 0 0.68 9

Weighted Avg. 0.5 0.5 0.25 0.5 0.333 0.505


a b c d e f g h i j <-- classified as12493 0 0 0 0 0 0 0 0 0 | a = 010599 0 0 0 0 0 0 0 0 0 | b = 11206 0 0 0 0 0 0 0 0 0 | c = 2513 0 0 0 0 0 0 0 0 0 | d = 393 0 0 0 0 0 0 0 0 0 | e = 454 0 0 0 0 0 0 0 0 0 | f = 536 0 0 0 0 0 0 0 0 0 | g = 66 0 0 0 0 0 0 0 0 0 | h = 75 0 0 0 0 0 0 0 0 0 | i = 85 0 0 0 0 0 0 0 0 0 | j = 9

88

B.1.4 The second test of naiveBayes=== Stratified cross-validation ====== Summary ===

Correctly Classified Instances 12493 49.952 %Incorrectly Classified Instances 12517 50.048 %Kappa statistic 0Mean absolute error 0.1137Root mean squared error 0.2384Relative absolute error 100 %Root relative squared error 100 %Total Number of Instances 25010


TP Rate FP Rate Precision Recall F-Measure ROC Area Class1 1 0.5 1 0.666 0.5 00 0 0 0 0 0.5 10 0 0 0 0 0.499 20 0 0 0 0 0.498 30 0 0 0 0 0.489 40 0 0 0 0 0.478 50 0 0 0 0 0.467 60 0 0 0 0 0.3 70 0 0 0 0 0.25 80 0 0 0 0 0.25 9

Weighted Avg. 0.5 0.5 0.25 0.5 0.333 0.5



B.1.5 The third test of naiveBayes=== Stratified cross-validation ===

89

=== Summary ===

Correctly Classified Instances 12272 49.0684 %Incorrectly Classified Instances 12738 50.9316 %Kappa statistic -0.0063Mean absolute error 0.1128Root mean squared error 0.2387Relative absolute error 99.2354 %Root relative squared error 100.1511 %Total Number of Instances 25010



Weighted Avg. 0.491 0.497 0.421 0.491 0.374 0.501



B.2 WEKA - code

B.2.1 Java program removing valuesMissingMain.java

package missingValues;

import java.util.ArrayList;

90

public class MissingMain {

static final String FOLDER_PATH = "C:\\Users\\amund_000\\Desktop\\Maskinlæring"+"\\Project3\\data\\weka\\missing\\";

static final String RAW_FILE = "training_none_missing.arff";static final String MISSING_FILE = "training.arff";

private static FileOrganizer fo;

private static MissingValuesCreator missingCreator;

private static ArrayList<ArrayList<String>> wekaInfo;private static ArrayList<ArrayList<String>> infoWithMissingValues;private static ArrayList<String> finalList;

public static void main(String[] args){init();

wekaInfo = fo.getArffData(FOLDER_PATH + RAW_FILE);

infoWithMissingValues = missingCreator.removeValues(wekaInfo);

finalList = new ArrayList<String>();

for(int i = 0 ; i < infoWithMissingValues.get(0).size(); i++)finalList.add(infoWithMissingValues.get(0).get(i));

for(int i = 0 ; i < infoWithMissingValues.get(1).size() ; i++)finalList.add(infoWithMissingValues.get(1).get(i));

fo.saveMissingValues(finalList, FOLDER_PATH + MISSING_FILE);}

private static void init(){fo = new FileOrganizer();missingCreator = new MissingValuesCreator();}

}

MissingValuesCreator.java


import java.util.ArrayList;

public class MissingValuesCreator{

91

private ArrayList<ArrayList<String>> info;

public MissingValuesCreator(){info = new ArrayList<ArrayList<String>>();info.add(new ArrayList<String>());info.add(new ArrayList<String>());}

public ArrayList<ArrayList<String>>removeValues(ArrayList<ArrayList<String>> wekaInfo) {

for(int i = 0 ; i < wekaInfo.get(0).size(); i++)this.info.get(0).add(wekaInfo.get(0).get(i));

for(int i = 0 ; i < wekaInfo.get(1).size(); i++){if(i%2 == 0)info.get(1).add(wekaInfo.get(1).get(i));

elseinfo.get(1).add(getMissingValue(wekaInfo.get(1).get(i)));}

return info;}

private String getMissingValue(String string){String instance = "";

String[] arr = string.split(",");

int randomIndex = (int) Math.floor(Math.random()*(arr.length - 1));

arr[randomIndex] = "?";

for(int i = 0 ; i < arr.length ; i++){instance += arr[i];

if(i != (arr.length - 1))instance += ",";}

return instance;}

}

92

FileOrganizer.java


import java.io.BufferedReader;import java.io.FileNotFoundException;import java.io.FileReader;import java.io.IOException;import java.io.PrintWriter;import java.io.UnsupportedEncodingException;import java.util.ArrayList;

public class FileOrganizer{private ArrayList<ArrayList<String>> info;

public FileOrganizer(){info = new ArrayList<ArrayList<String>>();info.add(new ArrayList<String>());info.add(new ArrayList<String>());}

public ArrayList<ArrayList<String>> getArffData(String string){readFile(string);return info;}

private void readFile(String string){boolean dataRead = false;

FileReader fr;BufferedReader br;String line = "";

try{fr = new FileReader(string);br = new BufferedReader(fr);

line = br.readLine();

while(line != null){if(!dataRead)this.info.get(0).add(line);

else

93

this.info.get(1).add(line);

if(line.contains("@DATA"))dataRead = true;

line = br.readLine();}

br.close();}catch(FileNotFoundException e){System.out.println("Couldn’t find file.");}catch(IOException e){System.out.println("Couldn’t read file.");}}

public void saveMissingValues(ArrayList<String> finalList, String path){try{PrintWriter writer = new PrintWriter(path,"UTF-8");

for(int i = 0 ; i < finalList.size(); i++){writer.write(finalList.get(i));writer.println();}

writer.close();}catch (FileNotFoundException | UnsupportedEncodingException e){e.printStackTrace();}}

}

B.3 WEKA screen dumps

94

Figure B.1: Option selection for the first test of WEKA’s SMO algorithm.

Figure B.2: Option selection for the first test of the naiveBayes algorithm.

95

Figure B.3: A screen shot of the test run with winnowing in c5.0.

Figure B.4: A screen shot of the test run with winnowing and costs-file in c5.0.

96

Appendix C

R

C.1 R scripts

C.1.1 First script to test the SVM included in packagee1071

#installs the e1071 packageinstall.packages(’e1071’, dependencies=TRUE)

library(class)library(e1071)

#File pathFILE <- "C:\\Users\\amund_000\\Desktop\\Maskinlæring\\Project3\\data\\r\\poker.csv"

#reads the file into datasetdataset<-read.csv(FILE)

#assigns indexes to rowsindex<-1:nrow(dataset)

#70/30 training/testtestindex<-sample(index,trunc(length(index)*30/100))

#creates test settestset<-dataset[testindex,]

#creates training settrainingset<-dataset[-testindex,]

test<-data.frame(testset)training<-data.frame(trainingset)

#Building of SVM with hand as class, and training as data set (C: classification).model<-svm(hand ~ ., training, type="C")

97

#prediction with test setprediction<-predict(model, test[, -11])

#table with prediction of test set and true classes in test settab<-table(pred=prediction, class=test[,11])

#prints the tabletab

C.1.2 Script to test RSNNS’s mlplibrary(RSNNS)FILE <- "C:\\Users\\amund_000\\Desktop\\...\\binary_small.csv"dataset <- read.csv(FILE)dataset <- dataset[sample(1:nrow(dataset), length(1:nrow(dataset))),1:ncol(dataset)]pokerInputs <- dataset[,-86]pokerTargets <- dataset[,86]pokerDecTargets <- decodeClassLabels(pokerTargets)dataset <- splitForTrainingAndTest(pokerInputs, pokerDecTargets, ratio = 0.15)dataset <- normTrainingAndTestSet(dataset)model <- mlp(dataset$inputsTrain, dataset$targetsTrain, size=20, maxit=100,inputsTest = dataset$inputsTest, targetsTest = dataset$targetsTest)predictions <- predict(model, dataset$inputsTest)

C.1.3 Final test of R’s SVM (e1071)> Sys.time() ; model<-svm(hand ~ ., trainingset,type="C", scale=FALSE, cost=4, kernel="radial", cross=10) ; Sys.time()[1] "2014-03-19 20:19:35 CET"[1] "2014-03-19 21:04:46 CET"> summary(model)

Call:svm(formula = hand ~ ., data = trainingset, type = "C",cost = 4, kernel = "radial", cross = 10, scale = FALSE)

Parameters:SVM-Type: C-classification

SVM-Kernel: radialcost: 4

gamma: 0.1

Number of Support Vectors: 16194

( 2 3 7118 7732 852 33 362 66 23 3 )

98

Number of Classes: 10

Levels:0 1 2 3 4 5 6 7 8 9

10-fold cross-validation on training data:

Total Accuracy: 54.91518Single Accuracies:55.31429 56.36779 55.39692 55.2 55.56825 54.1976 53.02857 55.05425 55.2827 53.74072

> prediction<-predict(model, testset[,-11])> tab<-table(pred=prediction, class=testset[,11])> tab

classpred 0 1 2 3 4 5 6 7 8 9

0 2449 1426 122 17 0 14 1 0 0 01 1262 1667 205 120 23 7 10 1 1 32 25 66 18 7 3 0 2 1 1 03 7 23 8 6 1 0 0 0 0 04 1 1 1 1 0 0 0 0 0 05 1 1 0 0 0 0 0 0 0 06 0 0 0 0 0 0 0 1 0 07 0 0 0 0 0 0 0 0 0 08 0 0 0 0 0 0 0 0 0 09 0 0 0 0 0 0 0 0 0 0

> prediction<-predict(model, trainingset[,-11])> tab<-table(pred=prediction, class=trainingset[,11])> tab

classpred 0 1 2 3 4 5 6 7 8 9

0 8664 165 3 0 0 1 0 0 0 01 84 7250 9 5 0 0 0 0 0 02 0 0 840 0 1 0 0 0 0 03 0 0 0 357 0 0 0 0 0 04 0 0 0 0 65 0 0 0 0 05 0 0 0 0 0 32 0 0 0 06 0 0 0 0 0 0 23 0 0 07 0 0 0 0 0 0 0 3 0 08 0 0 0 0 0 0 0 0 3 09 0 0 0 0 0 0 0 0 0 2

>

99

C.2 Java code

C.2.1 Code written to change data in csv-filespackage ClassifyCreator;

import java.io.BufferedReader;import java.io.FileNotFoundException;import java.io.FileReader;import java.io.IOException;import java.io.PrintWriter;import java.io.UnsupportedEncodingException;import java.util.ArrayList;

public class ClassifyHelper{static final StringFOLDER_PATH = "C:\\Users\\amund_000\\Desktop\\Maskinlæring\"+

"\Project3\\data\\r\\",INPUT_FILE_NAME = "poker.csv",OUTPUT_FILE_NAME = "poker2.csv";

static ArrayList<String> fileStrings;

public static void main(String[] args){init();if(readDataFromFile()){convertData();saveDataToFile();}}

private static void saveDataToFile(){try{PrintWriter writer = new PrintWriter(FOLDER_PATH + OUTPUT_FILE_NAME,"UTF-8");

for(int i = 0 ; i < fileStrings.size(); i++){writer.write(fileStrings.get(i));writer.println();}

writer.close();}catch (FileNotFoundException | UnsupportedEncodingException e){

100

e.printStackTrace();}}

private static void convertData(){for(int i = 0 ; i < fileStrings.size(); i++){String[] arr = fileStrings.get(i).split(",");

if(i != 0)arr[arr.length - 1] = "hand" + arr[arr.length - 1];

String newLine = "";

for(int j = 0 ; j < arr.length ; j++){newLine += arr[j];

if(j != arr.length - 1)newLine += ",";}

fileStrings.set(i, newLine);}}

private static boolean readDataFromFile(){FileReader fr;BufferedReader br;String line;

try{fr = new FileReader(FOLDER_PATH + INPUT_FILE_NAME);br = new BufferedReader(fr);

line = br.readLine();

while(line != null){fileStrings.add(line);line = br.readLine();}

br.close();}catch (FileNotFoundException e){

101

e.printStackTrace();return false;}catch(IOException e){e.printStackTrace();return false;}

return true;}

private static void init(){fileStrings = new ArrayList<String>();}}

102

Project 3: Automatic Programming, Weka & Rgoffeng.net/cato/skole/master/ML/project-3-automatic.pdf · 2015. 4. 10. · Chapter 1 Introductiontoartiﬁcial evolution&automatic programming

Documents