YOU ARE DOWNLOADING DOCUMENT

Please tick the box to continue:

Transcript
Page 1: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Classification

Data Mining Experiment

Department of Computer Science

Shenzhen Graduate School

Harbin Institute of Technology

Page 2: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Data Mining Resources on the Web

1. A comprehensive site for many resources of KDDhttp://www.kdnuggets.com/

2. tutorial type articles on currently hot topicshttp://www.sigkdd.org/

3. The KDD Cup(1997~2010)http://www.sigkdd.org/kddcup/index.php

4, UCI Datasethttp://archive.ics.uci.edu/ml/

5. Conferences, Journals, and Organizations SIGKDD,ICDM,SIGMOD,SDM,PAKDDIEEE Transactions on Knowledge and Data EngineeringData Mining Group

Page 3: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Tools

Clementine

Clementine is a platform of data mining developed by ISL (Integral Solutions Limited) company . SPSS company integrated and developed Clementine after purchasing the ISL company in 1999. Now Clementine has become another highlight of SPSS company. Merger and acquisition of IBM and SPSS happened in 2010

It is a data mining and text analytics workbench used to build predictive models. It has a visual interface which allows users to leverage statistical and data mining algorithms without programming.

Page 4: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Tools

Clementine

Page 5: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Workflow1

Page 6: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Dataset1

1. Led71. attribute#1, attribute#2, ….. attribute#7, label2. 3200 instance3. All attribute values are either 0 or 14. Whether the corresponding light is on or not for the decimal digit

Page 7: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Load the file

Page 8: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Operations

Page 9: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Partitions

Page 10: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

C5.0

Page 11: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

View the model

Page 12: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Model analysis

Page 13: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

CHAID

Page 14: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

View model

Page 15: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Dataset2

Listing of attributes:

label: >50K, <=50K.

Age, workclass, fnlwgt, education, education-num, marital-status, occupation, relationship, race, sex, capital-gain, capital-loss, hours-per-week, native-country

Page 16: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Flow

Page 17: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Setting

Page 18: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Partitions

Page 19: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

C5.0 Analysis

Page 20: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

CHAID Analysis

Page 21: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Data cleaning

Page 22: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Partition

Flow

Page 23: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

C5.0 and CHAID

Page 24: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

You can do other data preprocessing

according to your requirements.

Page 25: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Programming

• Programming– Use C4.5 or Bayes classifier– Dataset

Page 26: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Programming

Compare your resultwith the tool.

Page 27: Classification Data Mining Experiment Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.

Classification

Data Mining Experiment

Department of Computer Science

Shenzhen Graduate School

Harbin Institute of Technology


Related Documents