A Technique for Advanced Dynamic Integration of Multiple Classifiers Alexey Tsymbal*, Seppo Puuronen**, Vagan Terziyan* *Department of Artificial Intelligence and Information Systems, Kharkov State Technical University of Radioelectronics, UKRAINE e-mail: [email protected], [email protected].fi **Department of Computer Science and Information Systems, University of Jyvaskyla, FINLAND, e-mail: [email protected].fi STeP’98 - Finnish AI Conference, 7-9 September, 1998
28
Embed
A Technique for Advanced Dynamic Integration of Multiple Classifiers Alexey Tsymbal*, Seppo Puuronen**, Vagan Terziyan* *Department of Artificial Intelligence.
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
Transcript
A Technique for Advanced Dynamic Integration of Multiple Classifiers
Alexey Tsymbal*, Seppo Puuronen**, Vagan Terziyan*
*Department of Artificial Intelligence and Information Systems, Kharkov State Technical University of Radioelectronics, UKRAINE
**Department of Computer Science and Information Systems, University of Jyvaskyla, FINLAND, e-mail: [email protected]
STeP’98 - Finnish AI Conference, 7-9 September, 1998
Finland and UkraineFinland and Ukraine
University of JyväskyläFinland
State Technical University of Radioelectronics
KharkovUkraine
Metaintelligence Laboratory: Research Topics
• Knowledge and metaknowledge engineering;
• Multiple experts;
• Context in Artificial Intelligence;
• Data Mining and Knowledge Discovery;
• Temporal Reasoning;
• Metamathematics;
• Semantic Balance and Medical Applications;
• Distance Education and Virtual Universities.
Contents
• What is Knowledge Discovery ?
• The Multiple Classifiers Problem
• A Sample (Training) Set
• A Sliding Exam of Classifiers as Learning Technique
• A locality Principle
• Nearest Neighbours and Distance Measure
• Weighting Neighbours, Predicting Errors and Selecting Classifiers
• Data Preprocessing
• Some Examples
What is Knowledge Discovery ?
• Knowledge discovery in databases (KDD) is a combination of data warehousing, decision support, and data mining and it is an innovative new approach to information management.
• KDD is an emerging area that considers the process of finding previously unknown and potentially interesting patterns and relations in large databases*.
• * Fayyad, U., Piatetsky-Shapiro, G., Smyth, P., Uthurusamy, R., Advances in Knowledge Discovery and Data Mining, AAAI/MIT Press, 1996.
The Research Problem
During the past several years, in a variety of application domains, researchers in machine learning, computational learning theory, pattern recognition and statistics have tried to combine
efforts to learn how to create and combine an ensemble of classifiers.
The primary goal of combining several classifiers is to obtain a more accurate prediction than can be obtained from any single classifier alone.
Approaches to Integrate Multiple Classifiers
Integrating Multiple Classifiers
Selection Combination
Global (Static)
Local (Dynamic)
Local (“Virtual” Classifier)
Global (Voting-Type)
Decontextualization
Classification Problem
Given: n training pairs (xi, yi)
with xiRp and yi{1,…,J}
denoting class membership
Goal: given: new x0
select classifier for x0
predict class y0
J classes, n training observations,p object features
ClassificationClassifiers
Training set Vectorclassified
Classmembership
A Sample (Training) Set
X1
X2
Cixi2
xi1
P x x C
P x x C
P x x Cnn n
n
1 11
21
1
2 12
22
2
1 2
:( , ) ;
:( , ) ;
...
:( , ) .
Classifiers Used in Example
• Classifier 1: LDA - Linear Discriminant Analysis;
• Classifier 2: k-NN - Nearest Neighbour
Classification;
• Classifier 3: DANN - Discriminant Adaptive Nearest Neighbour
Classification
A Sliding Exam of Classifiers (Jackknife Method):
We apply all the classifiers to the Training Set
points and check correctness of classification
X1
X2
(0;0;1)
(1;0;0)
(0;0;0)
(0;0;0)
(0;0;0)(1;1;0)
(0;1;0)
(0;0;0)
(0;1;0)(0;0;0)
(0;0;0)
(0;0;0)
LDA - incorrect classification
k-NN - incorrect classification
DANN - correct classification
A Locality Principle
X1
X2
(0;0;1)
(1;0;0)
(0;0;0)
(0;0;0)
(0;0;0)
(0;1;0)
(0;0;0)
(0;1;0)(0;0;0)
(0;0;0)
(0;0;0)
We assume that also in neighbourhood of a pointwe may expect the sameclassification result:
• A suitable amount l of nearest neighbours for a training set point should be selected, which will be used to classify case related to this point.
• We have used l = max(3, n div 50) for all training set points in the example, where n is the amount of cases in a training set.
• ? ? Should we locally select an appropriate l value ?
Brief Review of Distance Functions According to D. Wilson and T. Martinez (1997)
Weighting Neighbours
X1
X2
(0;0;1)
(1;0;0)
(0;0;0)
(0;0;0)
(0;0;0)(1;1;0)
(0;1;0)
(0;0;0)
(0;1;0)(0;0;0)
(0;0;0)
(0;0;0)
d1
d2d3
NN3
NN1
NN2
Pidmax
T h e v a l u e s o f d i s t a n c e m e a s u r e a r e u s e d t o d e r i v e t h e w e i g h t w k f o re a c h o f s e l e c t e d n e i g h b o u r s k = 1 , … , l u s i n g f o r e x a m p l e a c u b i cf u n c t i o n :