Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
WEKA : the software
“Waikato Environment for Knowledge Analysis”
Data Mining Software in Java
– a collection of machine learning algorithms
for data mining tasks
– http://www.cs.waikato.ac.nz/ml/weka/
Inclusion
– data pre-processing, classification, regression,
clustering, association rules, visualization
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
How to install WEKA
Download WEKA from
– http://www.cs.waikato.ac.nz/ml/weka/index_downloadi
ng.html
– 강의자료 홈페이지
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
How to install WEKA
Next I Agree Next Next Install
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
WEKA GUI chooser
Explorer Experimenter
KnowledgeFlow Command Line Interface
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
WEKA Explorer : Open file
Open file Brings up a dialog box allowing you to browse for the data
file on the local file system
Open URL Asks for a Uniform Resource Locator address for where
the data is stored
Open DB Reads data from a database
Generate Enables you to generate
artificial data from a variety of
DataGenerators
Data can be imported from a file in
various formats: ARFF, CSV, C4.5
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
ARFF format
An ARFF (= Attribute-Relation File Format ) file is an ASCII
text file that describes a list of instances sharing a set of
attributes
ARFF files have two distinct sections
– Header : relation, attributes
– Data
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
@relation heart-disease-simplified
@attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present}
@data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,230,no,not_present .
.
.
ARFF data
@relation heart-disease-simplified
@attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present}
@data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,230,no,not_present .
.
.
Header
Data
a sample
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
ARFF data – header section
@relation heart-disease-simplified
@attribute age numeric @attribute sex { female, male} @attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina} @attribute cholesterol numeric @attribute exercise_induced_angina { no, yes} @attribute class { present, not_present}
Realation - @relation <relation-name>
– The relation name is defined as the first line in the ARFF
Attribute - @attribute <attribute-name> <datatype>
– @attribute statement uniquely defines the name of that attribute
– Data type
numeric(integer,real is treated as numeric)
<nominal-specification>
string
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
ARFF data – data section
Data - @data
– a single line denoting the start of the data segment in the file
@data 63,male,typ_angina,233,no,not_present 67,male,asympt,286,yes,present 67,male,asympt,229,yes,present 38,female,non_anginal,230,no,not_present .
.
.
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
Open arff file
weather
– 14 samples
– 4 attribute
– binary class
No. outlook temperature humidity windy
sunny 85 85 FALSE no
sunny 80 90 TRUE no
overcast 83 86 FALSE yes
rainy 70 96 FALSE yes
rainy 68 80 FALSE yes
rainy 65 70 TRUE no
overcast 64 65 FALSE yes
sunny 72 95 FALSE no
sunny 69 70 FALSE yes
rainy 75 80 TRUE yes
sunny 75 70 TRUE yes
overcast 72 90 TRUE yes
overcast 81 75 FALSE yes
rainy 71 91 TRUE no
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
File information
attribute & class
information
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
File information – visualization all
각각의 에트리뷰트 값에 대한 클래스 분포를 확인 할 수 있다
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
Classify Section
Select a classifier
Test Option
Select class attribute
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
Choose a classifier for classification
NaiveBayes
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
Classifier’s information & options
NaiveBayes
– Capabilities : 해당 알고리즘의 attr, class로 사용 가능한 data
type의 종류를 확인 할 수 있다.
click
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
k-fold cross validation(set as k = 3)
Set test options
data set
k-1 : training set, 1 : test set
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
Training & test
Click the ‘Start’ Button
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
Prediction Accuracy
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
Choose other classifiers
Multi Layer Perceptron & J48
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
Prediction result
Naïve Bayes
Multi Layer Perceptron
J48
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
Exercise & homework #3
이번 과제는 11월 12일 실습시간에 수행하여
제출하는 것을 기본으로 합니다. 그러나 수업에
출석하지 못한 학생들은 따로 실습을 수행하여
제출해도 됩니다.
UCI machine learning repository에서 iris 데이터를
다운받아 weka를 이용하여 classification,
clustering 등 데이터마이닝 작업을 수행하여 그
결과를 보고서에 카피하여 제출하세요.
(다음 수업시간에 프린트하여 제출)
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
Exercise & homework #3
classify와 같은 방법으로 cluster, associate를
수행한다.
classification : 다른 classifier를 사용해 본다.
cluster
– classifier option을 통해 cluster의 수를 데이터의
class의 수와 맞게 설정해 본다.
associate
– Apriori algorithm은 numeric value를 처리하지 못한다.
Data Mining Lab Introduction to Weka 11/12/2012 ‹#›
weka memory not enough
메모리 부족이 발생할 경우, 명령어
프롬프트(cmd.exe)에서 메모리를 확장하여
실행한다