Top Banner
A novel clustering algorithm based on weighted support and its application Author : Xiang-Rong Yang Jun-Yi Shen Qlang Liu Graduate : Chie n-Ming Hsiao
20

A novel clustering algorithm based on weighted support and its application

Jan 03, 2016

Download

Documents

Zachary McCoy

A novel clustering algorithm based on weighted support and its application. Author : Xiang-Rong Yang Jun-Yi Shen Qlang Liu Graduate : Chien-Ming Hsiao. Outline. Motivation Objective Introduction Description of some Terms Algorithm and Analysis - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: A novel clustering algorithm based on weighted support and its application

A novel clustering algorithm based on weighted support and its application

Author : Xiang-Rong Yang Jun-Yi Shen

Qlang Liu Graduate : Chien-Ming Hsiao

Page 2: A novel clustering algorithm based on weighted support and its application

Outline

Motivation Objective Introduction Description of some Terms Algorithm and Analysis Experimental results Conclusions Personal opinion

Page 3: A novel clustering algorithm based on weighted support and its application

Motivation

Many efficient clustering algorithms have been proposed but most of these works focus on numerical data.

Page 4: A novel clustering algorithm based on weighted support and its application

Objective

To present a novel and efficient algorithm WeiSC for clustering categorical data

Page 5: A novel clustering algorithm based on weighted support and its application

Introduction

Clustering is an important KDD problem. Objective : to group data into sets

Intra-cluster similarity is maximized Inter-cluster similarity is minimized

Most of these works focus on numerical data whose inherent geometric properties can be exploited naturally to define distance functions between data points.

Page 6: A novel clustering algorithm based on weighted support and its application

Introduction

The basic idea of WeiSC It repeatedly read tuples from dataset one by one When the first tuple arrives, it forms a cluster alone The consequent tuples are either put into existing cluster or rejecte

d by all existing clusters to form a new cluser by given similarity function defined between tuple and cluser.

Only makes one scan over the dataset

Page 7: A novel clustering algorithm based on weighted support and its application

Description of some Terms

m1

im21

DD domains with attributes lcategorica

ofset a is A where tuples,ofset a be A ,,A ,A DLet

eevery tupl of ID unique ofset thebe TIDLet

i

i

A tid, valas drepresente is

tupleingcorrespond of A attributefor value theTID, each tidFor

Page 8: A novel clustering algorithm based on weighted support and its application

Description of some Terms

DEFINITION 1

DEFINITION 2

DEFINITION 3

TID ofsubset is TID} tid| {tid Cluster

C tid A tid,val CVAL : as defined is C repect towith

Aon valuesattribute ofset theC,cluster aGiven

ii

i

SUM_CONTACONTAWEI

is A attribute of weight the,ACONTASUM_CONT

,A of valueattributedistinct ofcount thei.e. ,DACONTLet

ii

imi

iii

Page 9: A novel clustering algorithm based on weighted support and its application

Description of some Terms

DEFINITION 4

DEFINITION 5

iiiii

iii

atid.A tidAWEIa wei_sp: as definded is A repect to

with Cin a ofsupport weighted the,D alet C,cluster aGiven

C tidatid.Av a wei_sp,aCont ,aVS where

mi1VS CID,Summary : as defined is Cfor summary theC,cluster a Give

iiiiii

i

Page 10: A novel clustering algorithm based on weighted support and its application

Algorithm and Analysis

Overview Initially, the first tuple in the database is read and a cluster is con

structed. Then the consequent tuples are read iteratively.

The similarity between the new tuple and each existed clusters is computed according to

The similarity must be above the threshold, denoted as σ When computing the similarity, we use the clusters’ summary instea

d of the clusters themselves, since the information needed contained in clusters’ summary

Ccluster in tuplesofcount theis where, _

1 , 1 CC

aspweitidCsim

m

ii

Page 11: A novel clustering algorithm based on weighted support and its application
Page 12: A novel clustering algorithm based on weighted support and its application

Computational complexities

The time and space complexities of the WeiSC algorithm depend on

The size of dataset (|D|) The number of attributes (m) The number of the clusters (p) , f (σ) The size of each cluster, g (σ)

Time complexity O(|D| * m * f (σ)) Space complexity O(|D| + m * f (σ) * g (σ))

Page 13: A novel clustering algorithm based on weighted support and its application

Experimental results

The experimental results on the performance of WeiSC

Compare the clustering result with ROCK’s on the same data set

Page 14: A novel clustering algorithm based on weighted support and its application

Quality of clustering results with real-life datasets

Mushroom dataset (real-life) get from the UCI machine learning Corresponding to 23 species of gilled mushrooms

Each species is identified as definitely edible, definitely poisonous

Has 21 attributes with 8124 tuples The number of edible is 4208 The number of poisonous is 3916

Page 15: A novel clustering algorithm based on weighted support and its application
Page 16: A novel clustering algorithm based on weighted support and its application

The effect of σ

The parameter of σ Is the only parameter needed in WeiSC algorithm Effects the results of clustering and the speed of algorit

hm

Can use the percentage of misclassified tuples as measure of the effect Since the “edible” or “poisonous” has been labeled in e

ach tuple

Page 17: A novel clustering algorithm based on weighted support and its application
Page 18: A novel clustering algorithm based on weighted support and its application
Page 19: A novel clustering algorithm based on weighted support and its application

Conclusions

The WeiSC algorithm is robust and efficient From inference and experimental Read dataset only once

Used in IDS Is speedy and deserves good efficiency

Page 20: A novel clustering algorithm based on weighted support and its application

Personal Opinion

We can compare WeiSC algorithm with our algorithm.