Top Banner
Classi cation Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology
27

Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Dec 29, 2015

Download

Documents

Ashlee May
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Classification

Data Mining Experiment

Department of Computer Science

Shenzhen Graduate School

Harbin Institute of Technology

Page 2: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Data Mining Resources on the Web

1. A comprehensive site for many resources of KDDhttp://www.kdnuggets.com/

2. tutorial type articles on currently hot topicshttp://www.sigkdd.org/

3. The KDD Cup(1997~2010)http://www.sigkdd.org/kddcup/index.php

4, UCI Datasethttp://archive.ics.uci.edu/ml/

5. Conferences, Journals, and Organizations SIGKDD,ICDM,SIGMOD,SDM,PAKDDIEEE Transactions on Knowledge and Data EngineeringData Mining Group

Page 3: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Tools

Clementine

Clementine is a platform of data mining developed by ISL (Integral Solutions Limited) company . SPSS company integrated and developed Clementine after purchasing the ISL company in 1999. Now Clementine has become another highlight of SPSS company. Merger and acquisition of IBM and SPSS happened in 2010

It is a data mining and text analytics workbench used to build predictive models. It has a visual interface which allows users to leverage statistical and data mining algorithms without programming.

Page 4: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Tools

Clementine

Page 5: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Workflow1

Page 6: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Dataset1

1. Led71. attribute#1, attribute#2, ….. attribute#7, label2. 3200 instance3. All attribute values are either 0 or 14. Whether the corresponding light is on or not for the decimal digit

Page 7: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Load the file

Page 8: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Operations

Page 9: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Partitions

Page 10: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

C5.0

Page 11: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

View the model

Page 12: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Model analysis

Page 13: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

CHAID

Page 14: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

View model

Page 15: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Dataset2

Listing of attributes:

label: >50K, <=50K.

Age, workclass, fnlwgt, education, education-num, marital-status, occupation, relationship, race, sex, capital-gain, capital-loss, hours-per-week, native-country

Page 16: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Flow

Page 17: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Setting

Page 18: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Partitions

Page 19: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

C5.0 Analysis

Page 20: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

CHAID Analysis

Page 21: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Data cleaning

Page 22: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Partition

Flow

Page 23: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

C5.0 and CHAID

Page 24: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

You can do other data preprocessing

according to your requirements.

Page 25: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Programming

• Programming– Use C4.5 or Bayes classifier– Dataset

Page 26: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Programming

Compare your resultwith the tool.

Page 27: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Classification

Data Mining Experiment

Department of Computer Science

Shenzhen Graduate School

Harbin Institute of Technology