Top Banner
Data Mining Lab Introduction to Weka 11/12/2012 Data mining with WEKA
22

Data mining with WEKAcse.cnu.ac.kr/~cheonghee/lectures/14dm/intro_weka.pdf · 2016-01-19 · Data Mining Lab Introduction to Weka 11/12/2012 ‹#› Open arff file weather – 14

Apr 02, 2020

Download

Documents

dariahiddleston
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Data mining with WEKAcse.cnu.ac.kr/~cheonghee/lectures/14dm/intro_weka.pdf · 2016-01-19 · Data Mining Lab Introduction to Weka 11/12/2012 ‹#› Open arff file weather – 14

Data Mining Lab Introduction to Weka 11/12/2012

Data mining with WEKA

Page 2: Data mining with WEKAcse.cnu.ac.kr/~cheonghee/lectures/14dm/intro_weka.pdf · 2016-01-19 · Data Mining Lab Introduction to Weka 11/12/2012 ‹#› Open arff file weather – 14

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

WEKA : the software

“Waikato Environment for Knowledge Analysis”

Data Mining Software in Java– a collection of machine learning algorithms

for data mining tasks

– http://www.cs.waikato.ac.nz/ml/weka/

Inclusion– data pre-processing, classification, regression,

clustering, association rules, visualization

Page 3: Data mining with WEKAcse.cnu.ac.kr/~cheonghee/lectures/14dm/intro_weka.pdf · 2016-01-19 · Data Mining Lab Introduction to Weka 11/12/2012 ‹#› Open arff file weather – 14

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

How to install WEKA

Download WEKA from– http://www.cs.waikato.ac.nz/ml/weka/index_downloadi

ng.html

Page 4: Data mining with WEKAcse.cnu.ac.kr/~cheonghee/lectures/14dm/intro_weka.pdf · 2016-01-19 · Data Mining Lab Introduction to Weka 11/12/2012 ‹#› Open arff file weather – 14

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

How to install WEKA

Next I Agree Next Next Install

Page 5: Data mining with WEKAcse.cnu.ac.kr/~cheonghee/lectures/14dm/intro_weka.pdf · 2016-01-19 · Data Mining Lab Introduction to Weka 11/12/2012 ‹#› Open arff file weather – 14

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

WEKA GUI chooser

Explorer Experimenter

KnowledgeFlow Command Line Interface

Page 6: Data mining with WEKAcse.cnu.ac.kr/~cheonghee/lectures/14dm/intro_weka.pdf · 2016-01-19 · Data Mining Lab Introduction to Weka 11/12/2012 ‹#› Open arff file weather – 14

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

WEKA Explorer : Open file

Open file Brings up a dialog box allowing you to browse for the data file on the local file system

Open URL Asks for a Uniform Resource Locator address for where the data is stored

Open DB Reads data from a database

Generate Enables you to generate artificial data from a variety of DataGenerators

Data can be imported from a file in various formats: ARFF, CSV, C4.5

Page 7: Data mining with WEKAcse.cnu.ac.kr/~cheonghee/lectures/14dm/intro_weka.pdf · 2016-01-19 · Data Mining Lab Introduction to Weka 11/12/2012 ‹#› Open arff file weather – 14

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

Open arff file

weather– 14 samples– 4 attribute– binary class

No. outlook temperature humidity windysunny 85 85 FALSE nosunny 80 90 TRUE no

overcast 83 86 FALSE yesrainy 70 96 FALSE yesrainy 68 80 FALSE yesrainy 65 70 TRUE no

overcast 64 65 FALSE yessunny 72 95 FALSE nosunny 69 70 FALSE yesrainy 75 80 TRUE yessunny 75 70 TRUE yes

overcast 72 90 TRUE yesovercast 81 75 FALSE yes

rainy 71 91 TRUE no

Page 8: Data mining with WEKAcse.cnu.ac.kr/~cheonghee/lectures/14dm/intro_weka.pdf · 2016-01-19 · Data Mining Lab Introduction to Weka 11/12/2012 ‹#› Open arff file weather – 14

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

File information

attribute & class information

Page 9: Data mining with WEKAcse.cnu.ac.kr/~cheonghee/lectures/14dm/intro_weka.pdf · 2016-01-19 · Data Mining Lab Introduction to Weka 11/12/2012 ‹#› Open arff file weather – 14

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

File information – visualization all

각각의 에트리뷰트값에 대한 클래스분포를 확인 할 수있다

Page 10: Data mining with WEKAcse.cnu.ac.kr/~cheonghee/lectures/14dm/intro_weka.pdf · 2016-01-19 · Data Mining Lab Introduction to Weka 11/12/2012 ‹#› Open arff file weather – 14

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

Classify Section

Select a classifier

Test Option

Select class attribute

Page 11: Data mining with WEKAcse.cnu.ac.kr/~cheonghee/lectures/14dm/intro_weka.pdf · 2016-01-19 · Data Mining Lab Introduction to Weka 11/12/2012 ‹#› Open arff file weather – 14

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

Choose a classifier for classification

NaiveBayes

Page 12: Data mining with WEKAcse.cnu.ac.kr/~cheonghee/lectures/14dm/intro_weka.pdf · 2016-01-19 · Data Mining Lab Introduction to Weka 11/12/2012 ‹#› Open arff file weather – 14

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

k-fold cross validation(set as k = 2)

Set test options

data set

k-1 : training set, 1 : test set

Page 13: Data mining with WEKAcse.cnu.ac.kr/~cheonghee/lectures/14dm/intro_weka.pdf · 2016-01-19 · Data Mining Lab Introduction to Weka 11/12/2012 ‹#› Open arff file weather – 14

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

Training & test

Click the ‘Start’ Button

Page 14: Data mining with WEKAcse.cnu.ac.kr/~cheonghee/lectures/14dm/intro_weka.pdf · 2016-01-19 · Data Mining Lab Introduction to Weka 11/12/2012 ‹#› Open arff file weather – 14

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

Prediction Accuracy

Page 15: Data mining with WEKAcse.cnu.ac.kr/~cheonghee/lectures/14dm/intro_weka.pdf · 2016-01-19 · Data Mining Lab Introduction to Weka 11/12/2012 ‹#› Open arff file weather – 14

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

Choose other classifiers

Multi Layer Perceptron & DecisionStump

Page 16: Data mining with WEKAcse.cnu.ac.kr/~cheonghee/lectures/14dm/intro_weka.pdf · 2016-01-19 · Data Mining Lab Introduction to Weka 11/12/2012 ‹#› Open arff file weather – 14

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

Prediction result

Naïve Bayes

Multi Layer Perceptron

Decision Stump

Page 17: Data mining with WEKAcse.cnu.ac.kr/~cheonghee/lectures/14dm/intro_weka.pdf · 2016-01-19 · Data Mining Lab Introduction to Weka 11/12/2012 ‹#› Open arff file weather – 14

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›Data Mining Lab Introduction to Weka 11/12/2012

ARFF

Page 18: Data mining with WEKAcse.cnu.ac.kr/~cheonghee/lectures/14dm/intro_weka.pdf · 2016-01-19 · Data Mining Lab Introduction to Weka 11/12/2012 ‹#› Open arff file weather – 14

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

ARFF format

An ARFF (= Attribute-Relation File Format ) file is an ASCII text file that describes a list of instances sharing a set of attributes

ARFF files have two distinct sections– Header : relation, attributes– Data

Page 19: Data mining with WEKAcse.cnu.ac.kr/~cheonghee/lectures/14dm/intro_weka.pdf · 2016-01-19 · Data Mining Lab Introduction to Weka 11/12/2012 ‹#› Open arff file weather – 14

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

ARFF data

@relation heart-disease-simplified

@attribute age numeric@attribute sex { female, male}@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}@attribute cholesterol numeric@attribute exercise_induced_angina { no, yes}@attribute class { present, not_present}

@data63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,present67,male,asympt,229,yes,present38,female,non_anginal,230,no,not_present

.

.

.

Header

Data

Page 20: Data mining with WEKAcse.cnu.ac.kr/~cheonghee/lectures/14dm/intro_weka.pdf · 2016-01-19 · Data Mining Lab Introduction to Weka 11/12/2012 ‹#› Open arff file weather – 14

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

ARFF data – header section

@relation heart-disease-simplified

@attribute age numeric@attribute sex { female, male}@attribute chest_pain_type { typ_angina, asympt, non_anginal, atyp_angina}@attribute cholesterol numeric@attribute exercise_induced_angina { no, yes}@attribute class { present, not_present}

Realation - @relation <relation-name>– The relation name is defined as the first line in the ARFF

Attribute - @attribute <attribute-name> <datatype>– @attribute statement uniquely defines the name of that attribute– Data type

numeric(integer,real is treated as numeric)<nominal-specification>string

Page 21: Data mining with WEKAcse.cnu.ac.kr/~cheonghee/lectures/14dm/intro_weka.pdf · 2016-01-19 · Data Mining Lab Introduction to Weka 11/12/2012 ‹#› Open arff file weather – 14

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

ARFF data – data section

Data a single line denoting the start of the data segment in the file

@data63,male,typ_angina,233,no,not_present67,male,asympt,286,yes,present67,male,asympt,229,yes,present38,female,non_anginal,230,no,not_present

.

.

.

Page 22: Data mining with WEKAcse.cnu.ac.kr/~cheonghee/lectures/14dm/intro_weka.pdf · 2016-01-19 · Data Mining Lab Introduction to Weka 11/12/2012 ‹#› Open arff file weather – 14

Data Mining Lab Introduction to Weka 11/12/2012 ‹#›

실습과제

“UCI machine learning repository” 에서데이터를다운받는다

데이터의특징을설명

classification algorithm 두개를골라성능을비교한다

clustering을실행한다.기말고사일에리포트로제출