Top Banner
Intelligent Database Systems Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classi cation models with taxonomy information
21

Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy.

Jan 19, 2016

Download

Documents

Albert Fields
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy.

Intelligent Database Systems Lab

Presenter: CHANG, SHIH-JIE

Authors: Luca Cagliero, Paolo Garza

2013.DKE.

Improving classification models with taxonomy information

Page 2: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy.

Intelligent Database Systems Lab

Outlines

MotivationObjectivesMethodologyExperimentsConclusionsComments

Page 3: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy.

Intelligent Database Systems Lab

Motivation • A number of different approaches to build accurate

classifiers have been proposed but the integration of taxonomy information in data used for classifier training has never been investigated so far.

Page 4: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy.

Intelligent Database Systems Lab

Objectives

• This paper presents a general-purpose strategy to improve structured data classifier accuracy provided by a taxonomy built over data items.

Page 5: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy.

Intelligent Database Systems Lab

Definition. Aggregation tree

Definition. Multiple-taxonomy

Let T ¼ t 1 ; …; t be a set of attributes.A multiple-taxonomy Θ={AT 1 ,…,AT m } is a forest of aggregation trees defined on the domains of attributes in T .

Page 6: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy.

Intelligent Database Systems Lab

Methodology

Page 7: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy.

Intelligent Database Systems Lab

Methodology – Multiple-taxonomy over data items in D

Page 8: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy.

Intelligent Database Systems Lab

two-step process:(i)Generalized classification rule mining. ex: {(Location,Italy)} {(User category, Entrepreneur)}⇒ (s=50%, c=100%)(1)An extended training dataset version is generated first(2) FP-tree-like representation of the extended dataset is generated . Only frequent items are included in the FP-tree.

(ii)Rule selection by means of lazy pruning.

Page 9: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy.

Intelligent Database Systems Lab

Methodology – lazy pruning(1) Pruning rules that only misclassify training data.

(2) Rules that correctly classify at least one training data are grouped in the Level I rule set, while rules that remain unused during the training phase are kept in the Level II.

Page 10: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy.

Intelligent Database Systems Lab

Methodology – The G−L3 algorithm

Page 11: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy.

Intelligent Database Systems Lab

Methodology

Page 12: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy.

Intelligent Database Systems Lab

Methodology – G−L3 class prediction

When a new test case rt has to be classified, G−L3 considers the sorted rule sets in Level I and Level II.

If none of the Level I rules match rt , then the top-ranked rule in Level II matching r is considered.

If none of the rules belonging to the two model sets match rt , the default class label is assigned to rt.

Page 13: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy.

Intelligent Database Systems Lab

Experiments – Dataset characteristics

Page 14: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy.

Intelligent Database Systems Lab

Experiments – Accuracy comparison(baseline V.S. extended)

Page 15: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy.

Intelligent Database Systems Lab

Experiments – Accuracy comparison

Page 16: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy.

Intelligent Database Systems Lab

Experiments –

Page 17: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy.

Intelligent Database Systems Lab

Experiments –

Page 18: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy.

Intelligent Database Systems Lab

Experiments –

Page 19: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy.

Intelligent Database Systems Lab

Experiments – execution time comparison

Page 20: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy.

Intelligent Database Systems Lab

Conclusions– Taxonomy integration is shown to yield significant

accuracy improvements.

Page 21: Intelligent Database Systems Lab Presenter: CHANG, SHIH-JIE Authors: Luca Cagliero, Paolo Garza 2013.DKE. Improving classification models with taxonomy.

Intelligent Database Systems Lab

Comments• Advantages

– More accurate.• Applications

– Classification、 Data mining.