Intelligent Database Systems Lab 國國國國國國國國 National Yunlin University of Science and Technology 1 A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasets Presenter : Keng-Yu Lin Author : Amir Ahmad , Lipika Dey PRL. 2011
12
Embed
Presenter : Keng -Yu Lin Author : Amir Ahmad , Lipika Dey PRL . 2011
A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasets. Presenter : Keng -Yu Lin Author : Amir Ahmad , Lipika Dey PRL . 2011. Outlines. Motivation Objectives Methodology Experiments Conclusions Comments. Motivation. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Intelligent Database Systems Lab
國立雲林科技大學National Yunlin University of Science and Technology
1
A k-means type clustering algorithm for subspace clustering of mixed numeric and categorical datasets
Presenter : Keng-Yu LinAuthor : Amir Ahmad , Lipika Dey
I. M.Motivation· Almost all subspace clustering algorithms proposed so far
are designed for numeric datasets.
3
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.
4
Objectives· This paper present a k-means type clustering algorithm
that finds clusters in data subspaces in mixed numeric and categorical datasets.
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Methodology· k-means clustering algorithm
1. Place K points into the space represented by the objects that are being clustered. These points represent initial group centroids.
2. Assign each object to the group that has the closest centroid.
3. When all objects have been assigned, recalculate the positions of the K centroids.
4. Repeat Steps 2 and 3 until the centroids no longer move. This produces a separation of the objects into groups from which the metric to be minimized can be calculated.
5
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Methodology
6
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments· Vote dataset
7
error rate : 4.8%Zaki et al. error rate : 3.8%
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments· Mushroom datasets
8
error rate : 4.1%Zaki et al. error rate : 0.3%
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments· DNA datasets
9
error rate : 17%
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Experiments· Australian credit data
10
error rate : 13.9%Huang et al.(2005) error rate: 15%
Intelligent Database Systems Lab
N.Y.U.S.T.
I. M.Conclusions· This paper presented a clustering algorithm for
subspace clustering for mixed numeric and categorical data.