A Fuzzy Self Constructing Feature Clustering Algorithm for Text Classification

2010 / 03 / 17 Yi - Xian Lin 1

A Fuzzy Self-Constructing Feature Clustering Algorithm for Text

Classification

Jung-Yi Jiang, Ren-Jia Liou, and Shie-Jue Lee

Accepted by IEEE Transactions on Knowledge and Data Engineering

Reporter ：Yi-Xian Lin

National University of Tainan

2010 / 03 / 17 Yi - Xian Lin 2

Outline

• Motivation ＆ Objective

• Feature Reduction

• Feature Clustering

• Fuzzy Feature Clustering

• Text Classification

• Experimental results

• Advantages

2010 / 03 / 17 Yi - Xian Lin 3

Motivation ＆＆＆＆ Objective

• In text classification, the dimensionality of the feature vector is

usually huge

• The current problem of the existing feature clustering methods

� The desired number of extracted features has to be specified in advance

� When calculating similarities, the variance of the underlying cluster is

not considered

• How to reduce the dimensionality of feature vectors for text

classification and run faster ?

2010 / 03 / 17 Yi - Xian Lin 4

Feature Reduction

• Purpose

� Reduce classifier’s computation load

� Increase data consistency

• Techniques

� To eliminate redundant data

� To find representative data

� To reduce the dimensions of the feature sets

� To find the best set of vectors which best separate the patterns

• Two ways of doing feature reduction, feature selection

and feature extraction

2010 / 03 / 17 Yi - Xian Lin 5

Feature Reduction

• Feature selection

� Let the word set W={W1,W2,…,Wm} be the feature vector of the

document set

� Find a new word set

� Then W’ is used as inputs for classification tasks

• Feature extraction

� Extracted features are obtained by a projecting process through

algebraic transformations

� Let a corpus of documents be represented as an matrix

� Find an optimal transformation matrix

' ' ' '

1 2{ , ,... } , kW w w w k m= <

nm×nm

RX×∈

kmRF

×∈*

2010 / 03 / 17 Yi - Xian Lin 6

Feature Clustering

• Feature clustering is an efficient approach for feature reduction

• Groups all features into some clusters where features in a

cluster are similar to each other

• Let D be the matrix consisting of all the original documents

with m features and D’ be the matrix consisting of the

converted documents with new k features

• New feature set corresponds to a partition

{W1,W2,…,Wk} of the original feature set W

' ' ' '

1 2{ , ,... }kW w w w=

2010 / 03 / 17 Yi - Xian Lin 7

Fuzzy Feature Clustering

• A document set D of n documents d1,d2,...,dn

• Feature vector W of m words w1,w2,...,wm

• p classes c1,c2,...,cp

• Construct one word pattern for each word in W

where

( ) ( ) ( )1 2 1 2, ,..., | , | ,..., |i i i ip i i p ix x x x P c w P c w P c w= =

( ) 1

1

| , 1

n

qi qiq

j i n

qiq

dP c w for j p

d

δ=

=

×= ≤ ≤∑∑

2010 / 03 / 17 Yi - Xian Lin 8


( ) ( )6 1 6 2 6| , |x P c w P c w=

( )2 6

1 0 2 0 0 0 1 0 1 1 1 1 1 1 1 1 0 1| 0.50

1 2 0 1 1 1 1 1 0P c w

× + × + × + × + × + × + × + × + ×= =

+ + + + + + + +

2010 / 03 / 17 Yi - Xian Lin 9

Fuzzy Feature Clustering• Let G be a cluster containing q word patterns x1,x2,...,xq

• Let

• The mean

• The deviation

• The fuzzy similarity of a word pattern x to cluster G

1 2, ,..., , 1j j j jpx x x x j q= ≤ ≤

1

1 2, ,..., ,

q

jij

p i

xm m m m m

G

== =

∑

1 2, ,..., pσ σ σ σ=

( )2

1 , 1

q

ji jij

i

x mfor i p

Gσ =

−= ≤ ≤∑

( )2

1

expp

i i

i i

x mG xµ

σ=

− = −

∏

2010 / 03 / 17 Yi - Xian Lin 10


• A word pattern close to the mean of a cluster is regarded to

be very similar to this cluster

• Suppose m1 = < 0.4, 0.6 > , σ1 = < 0.3 , 0.5 >

( ) 1G xµ ≈

( )2 2

1 2

0.2 0.4 0.8 0.6exp exp

0.3 0.5

0.6412 0.8521 0.5464

G xµ − −

= − × −

= × =

2010 / 03 / 17 Yi - Xian Lin 11


• A predefined threshold ρ,

• If , xi passes the similarity test on cluster Gj

• If the user intends to have larger clusters, give a smaller

threshold

• Two cases may occur

� No existing fuzzy clusters on which xi has passed the similarity test

� Create a new cluster Gh , h = k + 1 ( k is the number of currently

existing clusters) ,

� is a user-defined constant vector

0 1ρ≤ ≤

( )j iG xµ ρ≥

0= , h i hm x σ σ=

0 0 0,...,σ σ σ=

2010 / 03 / 17 Yi - Xian Lin 12


• If there are existing clusters on which xi has passed the

similarity test, let cluster Gt be the cluster with the largest

membership degree ,

• Modification to cluster Gt

( )( )1

arg max j ij k

t G xµ≤ ≤

=

( )( )

0

2 22 2

0

, , 1

1 1 ,

1

1 , 1

t tj ij

tj tj

t

t tj t tj ij t tj ijt

t t t

t t

S m xm A B

S

S S m x S m xSA B

S S S

for j p and S S

σ σ

σ σ

× += = − +

+

− − + × + × + += =

+

≤ ≤ = +

2010 / 03 / 17 Yi - Xian Lin 13


• The order in which the word patterns are fed in influences the

clusters obtained

• Sort all the patterns, in decreasing order, by their largest

components

� Let x1 = < 0.1 , 0.3 , 0.6 > , x2 = < 0.3, 0.3, 0.4 > , x3 = < 0.8, 0.1, 0.1 >

� The largest components in these word patterns are 0.6, 0.4, and 0.8

� The sorted list is 0.8, 0.6, 0.4

� The order of feeding is x3, x1, x2

2010 / 03 / 17 Yi - Xian Lin 14


• The order of feeding : x5, x7, x10, x1, x4, x9, x2, x3, x8, x6

• No clusters exist at the beginning , k = 0

• Set σ0 = 0.5 , ρ=0.64

• Create G1

< 0.5 , 0.5 >< 1.00 , 0.00 >1G1

deviation σmean mSize Scluster

2010 / 03 / 17 Yi - Xian Lin 15


• Feeding : x7 μG1(x7) = 1 > ρ

( )( )

( )( )

11 12

1

2 22 2

11 11

2 22 2

12 12

11

1 1.00 1.00 1 0.00 0.001.00 , 0.00

1 1 1 1

1.00 , 0.00

1 1 0.5 0.5 1 1.00 1.00 1 1 1 1.00 1.00 ,

1 1 1 1

1 1 0.5 0.5 1 0.00 0.00 1 1 1 0.00 0.00 ,

1 1 1 1

m m

m

A B

A B

σ

× + × += = = =

+ +

=

− − + × + + × + = =

+

− − + × + + × + = =

+

11 11 12 11 11

1 1

0.5 0.5 , 0.5 0.5

0.5 , 0.5 , 1 1 2

A B A B

S

σ

σ

= − + = = − + =

= = + =

2010 / 03 / 17 Yi - Xian Lin 16


• After self-constructing clustering

• Similarities of patterns to clusters

2010 / 03 / 17 Yi - Xian Lin 17


• Data transformation

• H-FFC (hard weighting)

� each word is only allowed to belong to a cluster and so it only

contributes to a new extracted feature

'D DT=

[ ]1

' ' ' '

1 2 2 , TT

n nD d d d D d d d = = ⋯ ⋯

( )( )11 , arg max

0 , otherwise

k i

ij

j G xt

α αµ≤ ≤ =

=

if

2010 / 03 / 17 Yi - Xian Lin 18


H-FFC :

2010 / 03 / 17 Yi - Xian Lin 19


• S-FFC (soft weighting)

� each word is allowed to contribute to all new extracted features,

with the degrees depending on the values of the membership

functions

• M-FFC (mixed weighting)

� a combination of the hardweighting approach and the soft-

weighting approach

� γis a user-defined constant lying between 0 and 1

( )ij j it G xµ=

( ) ( )1H S

ij ij ijt t tγ γ= × + − ×

2010 / 03 / 17 Yi - Xian Lin 20


S-FFC :

2010 / 03 / 17 Yi - Xian Lin 21


M-FFC :

2010 / 03 / 17 Yi - Xian Lin 22

Text Classification

Training document data set

Feature reduction

Training data set for class 1

…...Training data set for class p

Train 1st classifier (SVM) Train p-th classifier (SVM)

…...

Unknown pattern

Feature reduction

…...

p classifiers are constructed.

2010 / 03 / 17 Yi - Xian Lin 23

Text Classification

• Training data set and target sets for SVMs

Class Target 1Target 2

C1 +1 -1

C1 +1 -1

C1 +1 -1

C1 +1 -1

C2 -1 +1

C2 -1 +1

C2 -1 +1

C2 -1 +1

C2 -1 +1

Training target set for class C1

Training target set for class C2

2010 / 03 / 17 Yi - Xian Lin 24

Text Classification

• Training classifiers

• Feature reduction for unknown pattern

1target ' +HD

2target ' +HD

Training classifier (SVM1)

Training classifier (SVM2)

Unknown pattern

Unknown pattern after feature reduction

2010 / 03 / 17 Yi - Xian Lin 25

Text Classification

• Classify the unknown pattern

Trained classifier (SVM1)

Trained classifier (SVM2)

-1 +1

Unknown pattern d Class C2Classified to

2010 / 03 / 17 Yi - Xian Lin 26

Experimental results

• Performance measures

class. wrt negatives False :

class. wrt positives False :

class. wrt negatives True :

class. wrt positives True :

classes. ofnumber :

i-thFN

i-thFP

i-thTN

i-thTP

p

i

i

i

i

( ) ( )

( )

( )

1 1

1 1

1

1

,

21 ,

P P

i ii i

P P

i i i ii i

P

i ii

P

i i i ii

TP TPMicroP MicroR

TP FP TP FN

TP TNMicroP MiccroRMicroF MicroAcc

MicroP MiccroR TP TN FP FN

= =

= =

=

=

= =+ +

+×= =

+ + + +

∑ ∑∑ ∑

∑∑

2010 / 03 / 17 Yi - Xian Lin 27


• 20 news groups data set

Number of classes 20

Number of

documents20000

Proportion of

training documents2/3

Proportion of

testing documents1/3

Number of features 25718

2010 / 03 / 17 Yi - Xian Lin 28


Execution time (sec) of different methods on 20 Newsgroup data

2010 / 03 / 17 Yi - Xian Lin 29


Microaveraged accuracy (%) of different methods on 20 Newsgroup data

2010 / 03 / 17 Yi - Xian Lin 30


Microaveraged F1 (%) of M-FFC with different γvalues

for 20 Newsgroups data

2010 / 03 / 17 Yi - Xian Lin 31

Advantages• a fuzzy self-constructing feature clustering (FFC)

algorithm which is an incremental clustering approach

to reduce the dimensionality of the features in text

classification

• Determine the number of features automatically

• Match membership functions closely with the real

distribution of the training data

• Runs faster

• Better extracted features than other methods

A Fuzzy Self Constructing Feature Clustering Algorithm for Text Classification

Documents