This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
425
1
0011 0010 1010 1101 0001 0100 1011Feature Selection with Kernel
Class Separability
指導教授:王振興 電機所 N28961523 林哲偉
電機所 N26974164 曾信輝電機所 N26974172 吳俐瑩
Date: 2009.01.14
Lei Wang, “Feature selection with kernel class separability,” IEEE Tras. Pattern Analysis and Machine Intelligence, vol. 30, no. 9, pp.1534-1546, 2008
112/04/21 1
425
1
0011 0010 1010 1101 0001 0100 1011
Outline
• Introduction
• Feature Selection
• Feature Selection Criterion
• Characteristic Analysis
• Experimental Results
• Conclusions
• Future work
112/04/21 2
425
1
0011 0010 1010 1101 0001 0100 1011
Introduction
• Classification can often benefit from efficient feature selection.
• A class separability criterion is developed in a high-dimensional kernel space.
• The criterion is applied to a variety of selection modes using different search strategies.
112/04/21 3
425
1
0011 0010 1010 1101 0001 0100 1011
Feature Selection
• Feature selection often consists of a selection criterion and a search strategy.
• In this paper, the author compared 5 different selection criteria, and 3 search strategy.
• The author executed 30 trials for each.
112/04/21 4
425
1
0011 0010 1010 1101 0001 0100 1011
Flow Chart
112/04/21 5
10 15 20 25 30 35 40 45 50
30 randomly chosen data
425
1
0011 0010 1010 1101 0001 0100 1011
Feature Selection Criterion
• Correlation coefficient– Higher relevance– Cannot handle linearly nonseparable data
• Kolmogorov-Smirnov test– Less possibility or higher test value– Needs a sufficient number of samples
112/04/21 6
425
1
0011 0010 1010 1101 0001 0100 1011
Feature Selection Criterion
• Class separability (Non-kernel)– Simple– Cannot handle linearly nonseparable data
• Radius-margin bound– Well handles linearly nonseparable data– Not computationally efficient
• Kernel class separability– Better performance than above
112/04/21 7
425
1
0011 0010 1010 1101 0001 0100 1011
Characteristic Analysis• In “Class separability” approach, the criterion is
tr(SB)/tr(SW).– tr(. ) denotes as “trace” of a matrix
– –
• In “Kernel-based class separability” approach, the criterion is TΦ=tr(SB
Φ)/tr(SWΦ).
– T* = max(TΦ)
• Using Gaussian kernel function
112/04/21 8
1
tr( ) .n
iiia
A
1 1 1
( )( ) , ( )( ) .inc c
T TW ij i ij i B i i i
i j i
S x m x m S n m m m m
2
2
|| ||( , ) exp( ).
2i j
i jK
x x
x x
425
1
0011 0010 1010 1101 0001 0100 1011
9
425
1
0011 0010 1010 1101 0001 0100 1011
112/04/21 10
425
1
0011 0010 1010 1101 0001 0100 1011
Experimental Results
• Synthetic Dataset 600 data points 52 features 2 classes
112/04/21 11
425
1
0011 0010 1010 1101 0001 0100 1011
Implementation
112/04/21 12
425
1
0011 0010 1010 1101 0001 0100 1011
Time Cost
112/04/21 13
425
1
0011 0010 1010 1101 0001 0100 1011• Use SVM test error to evaluate the significance of KCSM and RMB.
SVM Classifier
112/04/21 14
425
1
0011 0010 1010 1101 0001 0100 1011
15
425
1
0011 0010 1010 1101 0001 0100 1011
Conclusions and Discussions
• From our simulation results, the proposed kernel-based class separability measure is the best choice for feature selection in these 5 measures.
• However, the time cost increases dramatically with the growing number of data.
112/04/21 16
425
1
0011 0010 1010 1101 0001 0100 1011
Future work
112/04/21 17
• US Postal Service
7291 training samples and 2007 test samples. Each sample is characterized by 256 features.
We will try to implement the USPS dataset for further investigation.