Top Banner
11

The Weka The Weka is an well known bird of New Zealand.. W(aikato) E(nvironment) for K(nowlegde) A(nalysis) Developed by the University of Waikato.

Jan 11, 2016

Download

Documents

Charles Preston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato.
Page 2: The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato.

The Weka The Weka is an well known bird of New Zealand ..

W(aikato) E(nvironment) for K(nowlegde) A(nalysis)

Developed by the University of Waikato in New Zealand

It is Comprehensive suite of Java class libraries

Implement many state-of-the-art machine learning and data mining algorithms

It supports data files like CSV(Comma Separated file), ARFF(Attribute-Relation File Format)…

Page 3: The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato.

Collection of ML(Machine Learning) algorithms – open-source Java package

Schemes for classification include: decision trees, rule learners, naive Bayes, decision tables, locally weighted regression, SVMs, instance-based learners, logistic regression, voted perceptrons, multi-layer perceptron

Schemes for numeric prediction include: linear regression, model tree generators, locally weighted regression, instance-based learners, decision tables, multi-layer perceptron

Meta-schemes include: Bagging, boosting, stacking, regression via classification, classification via regression, cost sensitive classification

Schemes for clustering: EM and Cobweb

Page 4: The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato.

49 data preprocessing tools

76 classification/regression algorithms

8 clustering algorithms

15 attribute/subset evaluators + 10 search algorithms for feature selection

3 algorithms for finding association rules

3 graphical user interfaces“The Explorer” (exploratory data analysis)“The Experimenter” (experimental environment)“The Knowledge Flow” (new process model inspired interface)

Page 5: The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato.

Require declarations of @RELATION, @ATTRIBUTE and @DATA

@RELATION declaration associates a name with the dataset

Syntax: @RELATION <relation-name> E.g. @RELATION stud

@ATTRIBUTE declaration specifies the name and type of an attribute

Syntax: @attribute <attribute-name> <datatype> Datatype can be numeric, nominal, string or date E. g. @ATTRIBUTE sepallength NUMERIC

@ATTRIBUTE petalwidth NUMERIC @ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica}

@DATA declaration is a single line denoting the start of the data segment

Missing values are represented by ? @DATA

5.1, 3.5, 1.4, 0.2, Iris-setosa 4.9, ?, 1.4, ?, Iris-versicolor

Page 6: The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato.

In addition to nominal and numeric attributes, exemplified by the weather data, the ARFF format has two further attribute types: string attributes and date attributes. String attributes have values that are textual. Suppose you have a string attribute that you want to call description. In the block defining the attributes, it is specified as follows:

@attribute description string

Then, in the instance data, include any character string in quotation marks (to include quotation marks in your string, use the standard convention of preceding each one by a backslash, \). Strings are stored internally in a string table and represented by their address in that table. Thus two strings that contain the same characters will have the same value.

Page 7: The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato.

In Date attributes are strings with a special format and are introduced like this:

@attribute today date

(for an attribute called today). Weka, the machine learning software discussed in Part II of this book, uses the ISO-8601 combined date and time format yyyy-MM-dd-THH:mm:ss with four digits for the year, two each for the month and day, then the letter T followed by the time with two digits for each of hours, minutes, and seconds.1 In the data section of the file, dates are specified as the orresponding string representation of the date and time, for example, 2004-04-03T12:00:00. Although they are specified as strings, dates are converted to numeric form when the input file is read. Dates can also be converted internally to different formats, so you can have absolute timestamps in the data file and use transformations to forms such as time of day or day of the week to detect periodic behavior.

Page 8: The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato.

Similar to AARF files except that data value 0 are not represented

Non-zero attributes are specified by attribute number and value

For examples of ARFF files see $WEKAHOME/data

@data

0, X, 0, Y, “class A” 0, 0, W, 0, "class B"

@data

{1 X, 3 Y, 4 "class A"} {2 W, 4 "class B"}

Page 9: The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato.

-t <training file> Specify training file represented

-T <test files> If none, CV is performed on training data

-x <number of folds> Number of folds for cross-validation

-s <random number seed> For CV

-l <input file> Use saved model

-d <output file> Output model to file

Page 10: The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato.

Internal variables private Should have protected or package- level access

SparseInstance for Strings requires dummy at index 0

Problem: Strings are mapped into internal indices to an array String at position 0 is mapped to value “0” When written out as SparseInstance, it will not be written (0

value) If read back in, first String missing from Instances

Solution: Put dummy string in position 0 when writing a SparseInstance with

strings Dummy will be ignored while writing, actual instance will be written

properly

Page 11: The Weka The Weka is an well known bird of New Zealand..  W(aikato) E(nvironment) for K(nowlegde) A(nalysis)  Developed by the University of Waikato.