Top Banner
Presented by DEEPAN SHAKARAVARTHY V M.Tech.,
21
Welcome message from author
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
Page 1: High dimesional data (FAST clustering ALG) PPT

Presented by

DEEPAN SHAKARAVARTHY V M.Tech.,

Page 2: High dimesional data (FAST clustering ALG) PPT

Using FAST Algorithm to identify a subset.

Based on A fast clustering-based feature selection algorithm

(FAST) and experimentally evaluated.

Efficiency and effectiveness it adopt the efficient (MST)

clustering method.

Page 3: High dimesional data (FAST clustering ALG) PPT

Subset selection is an effective way for reducing

dimensionality.

Removing irrelevant data.

Increasing learning accuracy.

Improving results.

Page 4: High dimesional data (FAST clustering ALG) PPT

Accuracy of the learning algorithms is not

guaranteed.

Selected features is limited and the computational

complexity is large.

Many irrelevant and redundant features are

possible.

Page 5: High dimesional data (FAST clustering ALG) PPT

The selected features is limited.

The computational complexity is large.

The accuracy of the learning algorithms is not

guaranteed.

Page 6: High dimesional data (FAST clustering ALG) PPT

Forming clusters by using graph-theoretic

clustering methods.

Selection algorithms effectively eliminate

irrelevant features.

Achieve significant reduction of dimensionality.

Page 7: High dimesional data (FAST clustering ALG) PPT

It provide good feature subsets selection.

The efficiently deal with both irrelevant and redundant

features.

Totally find the duplicate data set.

Less time to find results.

Page 8: High dimesional data (FAST clustering ALG) PPT
Page 9: High dimesional data (FAST clustering ALG) PPT

Distributed clustering Subset Selection Algorithm Time complexity Microarray data Data source Irrelevant feature 

Page 10: High dimesional data (FAST clustering ALG) PPT

Cluster words into groups.

Cluster evaluation measure based on distance.

Even compared with other feature selection

methods the obtained accuracy is lower.

Page 11: High dimesional data (FAST clustering ALG) PPT

The Irrelevant features, along with redundant

features.

Identify and remove as much of the irrelevant data.

Good feature subsets contain features highly

correlated.

Page 12: High dimesional data (FAST clustering ALG) PPT

Calculated in terms of the number of instances in a

given dataset.

Features selection as relevant ones in the first part.

Construct a complete graph from relevant feature.

Partitions the MST and choose the representative

features with the complexity.

Page 13: High dimesional data (FAST clustering ALG) PPT

Use to identify length of the data.

It manage a searchable index.

Subset selection feature has been improved.

FAST ranks 1 again with the proportion of

selected features.

Page 14: High dimesional data (FAST clustering ALG) PPT

Purposes of evaluating the performance and

effectiveness of our proposed FAST algorithm.

Data sets have more than 10,000 features.

Hospitality dataset is used.

Page 15: High dimesional data (FAST clustering ALG) PPT

Right relevance measure is selected

1. Minimum spanning tree

2. The partitioning of cluster

3. Representative features from the clusters

Page 16: High dimesional data (FAST clustering ALG) PPT
Page 17: High dimesional data (FAST clustering ALG) PPT
Page 18: High dimesional data (FAST clustering ALG) PPT
Page 19: High dimesional data (FAST clustering ALG) PPT

The conclusion of the project is a subset of good

features with respect to the target concepts.

Feature selection is used to cluster the related data

in databases.

Feature subset selection is an effective way for

reducing dimensionality, removing irrelevant data,

increasing learning accuracy.

Page 20: High dimesional data (FAST clustering ALG) PPT

[1] H. Almuallim and T.G. Dietterich, (1994), ““Algorithms for

Identifying Relevant Features,” Artificial Intelligence, vol. 69,

nos. 1/2, pp. 279-305.

[2] L.D. Baker and A.K. McCallum, (1998), “ Learning boolean

concepts in the presence of many irrelevant features,” Proc. 21st

Ann. Int’l ACM SIGIR Conf. Research and Development in

information Retrieval, pp. 96-103.

[3] Arauzo-Azofra, J.M. Benitez, and J.L. Castro, (2004), “A

Feature Set Measure Based on Relief,” Proc. Fifth Int’l Conf.

Recent Advances in Soft.

Page 21: High dimesional data (FAST clustering ALG) PPT