Top Banner
1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore
21

1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore.

Dec 21, 2015

Download

Documents

Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore.

1

Building Hierarchical Classifiers Using Class Proximity

Ke Wang

Senqiang Zhou

Shiang Chen Liew

National University of Singapore

Page 2: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore.

VLDB99, Sept 6-10, Edinburgh

2

Hierarchical classification• Given

– a class hierarchy– a collection of pre-classified documents – a document is a set of terms

• Build– a classifier that assigns a relevant class to a new

document

• Key– extract features of classes

Page 3: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore.

VLDB99, Sept 6-10, Edinburgh

3

Yahoo classes

Yahoo

recreation science

automotivesports

skatingcycling

Page 4: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore.

VLDB99, Sept 6-10, Edinburgh

4

ACM classes

Hardware

General Memory_structure

General Design_style

Cache_memories

Level 1

Level 2

Level 3

Level 4

Page 5: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore.

VLDB99, Sept 6-10, Edinburgh

5

Existing local approaches

• build one classifier at each split of the

class hierarchy

• determine features locally at each node

• classify a document by going through a

path of classifiers starting from the root

Page 6: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore.

VLDB99, Sept 6-10, Edinburgh

6

Diminishing of high level structure

• rely on classification at high levels

• but high level structures usually weak, i.e., divergence of topics

• e.g., “car” is a feature at Recreation: Automotive, but not at Recreation

Page 7: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore.

VLDB99, Sept 6-10, Edinburgh

7

Bias of misclassification

• sibling classes Vs. nephew classes

• misclassification at high levels Vs. at low levels

• specialisation Vs. generalisation

Page 8: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore.

VLDB99, Sept 6-10, Edinburgh

8

Features should be

• determined wrt the target class

• determined at all concept levels

• correlated

The solution: generalised association rules (SA95, HF95)

{sql, IO} DB

{language, performance} CS

Page 9: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore.

VLDB99, Sept 6-10, Edinburgh

9

Our approach

• class proximity

• global classifier

• term hierarchy

• use the “best” generalised association rule

T C to determine the class

Page 10: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore.

VLDB99, Sept 6-10, Edinburgh

10

Rank association rules

• Biased confidence

• Biased J-measure

Page 11: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore.

VLDB99, Sept 6-10, Edinburgh

11

An example

author story

writer editor fiction poem

Music Literature

A_Music A_Literature

Arts

Term hierarchy

Class hierarchy

... ...

Page 12: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore.

VLDB99, Sept 6-10, Edinburgh

12

Term hierarchy(T)=YesClass proximity(B)=Yes

• R0: author,storyLiterature (ConfB=1,Clist=d6,d7)

• R1: authorLiterature (ConfB=1)

• R2: storyLiterature (ConfB=0.67, Wlist=d5(1))

• R4: hallMusic (ConfB=0.4, Clist=d1,d2, Wlist=d3(1))

• R3: StatesA_Literature (ConfB=0.33, Clist=d4,d5)

Page 13: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore.

VLDB99, Sept 6-10, Edinburgh

13

Page 14: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore.

VLDB99, Sept 6-10, Edinburgh

14

Experiment I

• http://www.acm.org/dl/toc.html/

• 26,515 papers, 78 classes, 14,754 terms

• class hierarchy=Level-1 and level-2 categories

• term hierarchy=Level-3 and level-4 categories

• document=Title and level-4 categories

Page 15: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore.

VLDB99, Sept 6-10, Edinburgh

15

Best rules found by (B,T)• CSO:

– vector,stream,processor,parallelProcessor_Architectures– multiple_instruction_streamProcessor_Architectures– data_flow,architecturProcessor_Architectures– internet, architecturComputer_Communication_Networks – mode,atmComputer_Communication_Networks – network,circuit_switching Computer_Communication_Networks – tecniqu, model, attributPerformance_of_Systems

• Software:– program,function, applicationProgramming_Techniques– object_oriented_programmingProgramming_Techniques– reusable_softwareSoftware_Engineering– software,methodologieSoftware_Engineering– organization, distributed_systemOperating_Systems

Page 16: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore.

VLDB99, Sept 6-10, Edinburgh

16

() --- | (T) --- (B) --- (B,T) --- (CDAR97,T) --- (CDAR97) ---

Page 17: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore.

VLDB99, Sept 6-10, Edinburgh

17

Page 18: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore.

VLDB99, Sept 6-10, Edinburgh

18

Experiment II

• http://dir.yahoo.com/recreation/sports

• 7,550 documents

• 367 classes, 7 levels

• 10,747 terms

• 90% of the terms occur in no more than 10 documents and many documents contain only such terms

Page 19: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore.

VLDB99, Sept 6-10, Edinburgh

19

Best rules found by (B,T)• Sports:Cycling:

– page,mountain Mountain_Biking– product,bikeMountain_Biking– mtb,mountain Mountain_Biking– held,bicyclRaces– classic,bicyclRaces– trip,tourTravelogues– trip,canada Travelogues– bicycl,alaskaTravelogues

• Sports:Auto_Racing:– team,result,driverFormula_one– model,featurTracks_and_Speedways– ovalTracks_and_Speedways– racewayTracks_and_Speedways

Page 20: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore.

VLDB99, Sept 6-10, Edinburgh

20

Page 21: 1 Building Hierarchical Classifiers Using Class Proximity Ke Wang Senqiang Zhou Shiang Chen Liew National University of Singapore.

VLDB99, Sept 6-10, Edinburgh

21