1 Weka: Practical Machine Learning Tools and Techniques with Java Implementations Proceedings of the ICONIP/ANZIIS/ANNES'99 Workshop on Emerging Knowledge Engineering and Connectionist-Based Information Systems, pages 192-196, 1999. Dunedin, New Zealand. Ian H. Witten, Eibe Frank, Len Trigg, Mark Hall, Geoffrey Holmes, and Sally Jo Cunningham. Reporter: Jin-huei Dai
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
1
Weka: Practical Machine Learning Tools and Techniques with Java Implementations
Proceedings of the ICONIP/ANZIIS/ANNES'99 Workshop on Emerging Knowledge Engineering and Connectionist-
Based Information Systems, pages 192-196, 1999. Dunedin, New Zealand.
Ian H. Witten, Eibe Frank, Len Trigg, Mark Hall, Geoffrey Holmes, and Sally Jo Cunningham.
Reporter: Jin-huei Dai
2
OUTLINE
1. Introduction
2. The command-line interface
3. The Explorer
4. The Knowledge Flow interface
5. The Experimenter
6. Conclusions
7. References
3
1. Introduction
Data mining is an experimental science. Machine learning provides the technical basis of data mining.
The Weka workbench is a collection of state-of-the-art machine learning algorithms and data preprocessing tools. It is designed so that users can quickly try out existing methods on new datasets in flexible ways. It provides extensive support for the whole process of experimental data mining, including preparing the input data, evaluating learning schemes statistically, and visualizing the input data and the result of learning.
Weka was developed at the University of Waikato in New Zealand, and the name stands for Waikato Environment for Knowledge Analysis.
4
1. Introduction(cont.) Weka is freely available on the World-Wide Web and ac
companies a new text on data mining which documents and fully explains all the algorithms it contains. Applications written using the Weka class libraries can be run on any computer with a Web browsing capability; this allows users to apply machine learning techniques to their own data regardless of computer platform. The Weka software is written entirely in Java to facilitate the availability of data mining tools regardless of computer platform.
The primary learning methods in Weka are “classifiers”, and they induce a rule set or decision tree that models the data. Weka also includes algorithms for learning association rules and clustering data.
5
6
7
8
9
2.The command-line interface
10
11
3.The Explorerp.375 === Run information ===Scheme: weka.classifiers.trees.J48 -C 0.25 -M 2Relation: weatherInstances: 14 Attributes: 5outlook temperature humidity windy playTest mode: 10-fold cross-validation=== Classifier model (full training set) ===J48 pruned treeoutlook = sunny| humidity <= 75: yes (2.0)| humidity > 75: no (3.0)outlook = overcast: yes (4.0)outlook = rainy| windy = TRUE: no (2.0)| windy = FALSE: yes (3.0)Number of Leaves : 5Size of the tree : 8
12
Time taken to build model: 0.08 seconds=== Stratified cross-validation ====== Summary ===Correctly Classified Instances 9 64.2857 %Incorrectly Classified Instances 5 35.7143 %Kappa statistic 0.186 Mean absolute error 0.2857Root mean squared error 0.4818Relative absolute error 60 %Root relative squared error 97.6586 %Total Number of Instances 14 === Detailed Accuracy By Class ===TP Rate FP Rate Precision Recall F-Measure Class 0.778 0.6 0.7 0.778 0.737 yes 0.4 0.222 0.5 0.4 0.444 no=== Confusion Matrix === a b <-- classified as 7 2 | a = yes 3 2 | b = no
13
14
15
444.09.0
4.0
4.05.0
)4.0)(5.0(2
737.0778.07.0
)778.0)(7.0(22
5.022
2 ,7.0
37
7 Pr
222.027
2 ,6.0
23
3
TNFP
FPRate
4.023
2 ,778.0
27
7
RPPRmeasureF
ecision
FP
TPRate
16
17
18
19
=== Run information ===Scheme: weka.associations.Apriori -N 10 -T 0 -C 0.9 -D 0.05 -U 1.0 -M 0.1 -S -1.0Relation: weather.symbolicInstances: 14Attributes: 5=== Associator model (full training set) ===Size of set of large itemsets L(1): 12Size of set of large itemsets L(2): 47Size of set of large itemsets L(3): 39Size of set of large itemsets L(4): 6
Association rules Weka contain an implementation of the Apriori leaner for generating association rules, a commonly use technique in market basket analysis. This algorithm does not seek rules that predict a particular class attribute but rather looks for any rules that capture strong associations between different attribute.
ClusteringMethod of clustering also do not seek rules that predict a particular class, but rather try to divide the data into natural groups or “clusters.” Weka includes an implementation of the EM algorithm, which can be used for unsupervised learning, it makes the assumption that all attributes are independent random variables.
• LIBSVM is an integrated software for support vector classification, (C-SVC, nu-SVC), regression (epsilon-SVR, nu-SVR) and distribution estimation (one-class SVM ). It supports multi-class classification.
• Since version 2.8, it implements an SMO-type algorithm proposed in this paper:R.-E. Fan, P.-H. Chen, and C.-J. Lin. Working set selection using second order information for training SVM. Journal of Machine Learning Research 6, 1889-1918,
As the technology of machine learning continues to develop and mature, learning algorithms need to be brought to the desktops of people who work with data andunderstand the application domain from which it arises. It is necessary to get the algorithms out of the laboratoryand into the work environment of those who can use them. Weka is a significant step in the transfer of machine learning technology into the workplace.
60
6. Conclusions(cont.)
The primary one of the three separate interactive interfaces is Explorer, which gives access to all of Weka’s facilities using menu selection and form filling.
The Knowledge Flow interface allows users to design configurations for streamed data processing, and the Experimenter, with which users set up automated experiments that run selected machine learning algorithms with different parameter settings on a corpus of datasets, collect performance statistics, and perform significance tests on the results.
61
7. References
1. Ian H. Witten & Eibe Frank. [2005]. Data Mining- Practical Machine Learning Tools and Techniques, Second Edition, Morgan kaufmann,San Francisco.2. Zdravko Markov, Ingrid Russell. 2006. An Introduction to the WEKA Data Mining System, ITICSE '06: Proceedings of the 11th annual SIGCSE conference on Innovation and technology in computer science education.3. 台大資工系林智仁(cjlin)老師的 libsvm