Data mining with WEKAcse.cnu.ac.kr/~cheonghee/lectures/14dm/intro_weka.pdf · 2016-01-19 · Data Mining Lab Introduction to Weka 11/12/2012 ‹#› Open arff file weather – 14

Post on 02-Apr-2020

0 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

Transcript

Data Mining Lab Introduction to Weka 11/12/2012

Data mining with WEKA

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

WEKA : the software

“Waikato Environment for Knowledge Analysis”

Data Mining Software in Java– a collection of machine learning algorithms

for data mining tasks

– http://www.cs.waikato.ac.nz/ml/weka/

Inclusion– data pre-processing, classification, regression,

clustering, association rules, visualization

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

How to install WEKA

Download WEKA from– http://www.cs.waikato.ac.nz/ml/weka/index_downloadi

ng.html

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

How to install WEKA

Next I Agree Next Next Install

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

WEKA GUI chooser

Explorer Experimenter

KnowledgeFlow Command Line Interface

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

WEKA Explorer : Open file

Open file Brings up a dialog box allowing you to browse for the data file on the local file system

Open URL Asks for a Uniform Resource Locator address for where the data is stored

Open DB Reads data from a database

Generate Enables you to generate artificial data from a variety of DataGenerators

Data can be imported from a file in various formats: ARFF, CSV, C4.5

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

Open arff file

weather– 14 samples– 4 attribute– binary class

No. outlook temperature humidity windysunny 85 85 FALSE nosunny 80 90 TRUE no

overcast 83 86 FALSE yesrainy 70 96 FALSE yesrainy 68 80 FALSE yesrainy 65 70 TRUE no

overcast 64 65 FALSE yessunny 72 95 FALSE nosunny 69 70 FALSE yesrainy 75 80 TRUE yessunny 75 70 TRUE yes

overcast 72 90 TRUE yesovercast 81 75 FALSE yes

rainy 71 91 TRUE no

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

File information

attribute & class information

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

File information – visualization all

각각의 에트리뷰트값에 대한 클래스분포를 확인 할 수있다

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

Classify Section

Select a classifier

Test Option

Select class attribute

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

Choose a classifier for classification

NaiveBayes

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

k-fold cross validation(set as k = 2)

Set test options

data set

k-1 : training set, 1 : test set

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

Training & test

Click the ‘Start’ Button

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

Prediction Accuracy

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

Choose other classifiers

Multi Layer Perceptron & DecisionStump

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

Prediction result

Naïve Bayes

Multi Layer Perceptron

Decision Stump

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›Data Mining Lab Introduction to Weka 11/12/2012

ARFF

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

ARFF format

An ARFF (= Attribute-Relation File Format ) file is an ASCII text file that describes a list of instances sharing a set of attributes

ARFF files have two distinct sections– Header : relation, attributes– Data

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

ARFF data

@relation heart-disease-simplified

@attribute age numeric@attribute sex { female, male}@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}@attribute cholesterol numeric@attribute exercise_induced_angina { no, yes}@attribute class { present, not_present}

@data63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,present67,male,asympt,229,yes,present38,female,non_anginal,230,no,not_present

.

.

.

Header

Data

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

ARFF data – header section

@relation heart-disease-simplified

@attribute age numeric@attribute sex { female, male}@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}@attribute cholesterol numeric@attribute exercise_induced_angina { no, yes}@attribute class { present, not_present}

Realation - @relation <relation-name>– The relation name is defined as the first line in the ARFF

Attribute - @attribute <attribute-name> <datatype>– @attribute statement uniquely defines the name of that attribute– Data type

numeric(integer,real is treated as numeric)<nominal-specification>string

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

ARFF data – data section

Data a single line denoting the start of the data segment in the file

@data63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,present67,male,asympt,229,yes,present38,female,non_anginal,230,no,not_present

.

.

.

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

실습과제

“UCI machine learning repository” 에서데이터를다운받는다

데이터의특징을설명

classification algorithm 두개를골라성능을비교한다

clustering을실행한다.기말고사일에리포트로제출

top related