Intelligent Database Systems Lab N.Y.U.S. T. I. M. Information-theoretic distance measures for clustering validation: Generalization and normalization IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, TKDE (2009) Presenter : Lin, Shu-Han Authors : Ping Luo, Hui Xiong, Guoxing Zhan, Junjie Wu, and Zhongzhi Shi
13
Embed
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, TKDE (2009)
Information-theoretic distance measures for clustering validation: Generalization and normalization. Presenter : Lin, Shu -Han Authors : Ping Luo , Hui Xiong , Guoxing Zhan, Junjie Wu, and Zhongzhi Shi. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, TKDE (2009). Outline. - PowerPoint PPT Presentation
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.
Information-theoretic distance measures for clustering validation:
Generalization and normalization
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, TKDE (2009)
Presenter : Lin, Shu-HanAuthors : Ping Luo, Hui Xiong, Guoxing Zhan, Junjie Wu, and Zhongzhi Shi
External criteria for clustering validation: Information-theoretic distance measures are used to Comparing the
clustering output with the “true” partition
Clustering ability of algorithms: Compare different clustering algorithms, given dataset
Clustering difficulty of datasets: Compare different datasets, given algorithm
3
A B C1 30 0 1
2 2 20 0
3 0 2 15
σ : the “true” partitionπ:
clustering output
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Objectives
Since Dimension, size, sparseness of data; scales of attributes are different for different datasets. the range of distance measures are different To do fair comparison: distance normalization
4
A B C120 120 120
A B C D E F G12 23 30 24 5 90 20
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Methodology – Conditional Entropy
5
The equality C1=C2 yields the Shannon entropy
π: group labelσ: class label
Intelligent Database Systems Lab
N.Y.U.S.T.I. M.Methodology – Quasi-Distance
6
Minimum reachable: d(π, σ) reaches its minimum over both and iff π=σ