FEATURE SELECTION USING ANT COLONY OPTIMIZATION: APPLICATIONS IN HEALTH … · 2012-02-14 · ANT COLONY OPTIMIZATION Eindhoven, the Netherlands 48 Biologically inspired algorithms
This document is posted to help you gain knowledge. Please leave a comment to let me know what you think about it! Share it to your friends and learn new things together.
S. M. Vieira1, S. N. Finkelstein2,3, A. S. Fialho1,2,F. Cismondi1,2, S. R. Reti3 and M. D. Howell3
1 Technical University of Lisbon, Instituto Superior Técnico, Dept. of Mechanical Engineering,CIS/IDMEC – LAETA, Av. Rovisco Pais, 1049-001 Lisbon, Portugal
2 Massachusetts Institute of Technology, Engineering Systems Division, 77 MassachusettsAvenue, 02139 Cambridge, MA, USA
3 Division of Clinical Informatics, Department of Medicine, Beth Israel Deaconess MedicalCentre, Harvard Medical School, Boston, MA, USA
Motivation
Knowledge discovery process
20 September 2010 Eindhoven, the Netherlands 2
Modeling
DataTarget data
Preprocesseddata
Reduceddata
Patterns
Knowledge
Data acquisition
Preprocessing
Feature selection
Interpretation
From G. Piatetsky-Shapiro U. Fayyad and P. Smyth. From data mining to knowledge discovery in databases.Artificial Intelligence Magazine, 17(3):37-54, 1996.
Outline
Motivation
ModelingNeural networksFuzzy sets and systemsFuzzy modeling
Feature selection
Ant colony optimization
Ant feature selection
Application: predicting outcomes of sepsis patients
S. Haykin. Neural Networks - A ComprehensiveFoundation. Prentice Hall, 1999.
J.-S. Jang, C.-T. Sun and E. Mizutani. Neuro-Fuzzy andSoft Computing: A Computational Approach toLearning and Machine Intelligence. Prentice Hall, NewJersey, 1997.
Andries P. Engelbrecht. Computational Intelligence: AnIntroduction. John Wiley, Chichester, 2002
Michael Negnevitsky. Artificial Intelligence: A Guide toIntelligent Systems. Addison-Wesley, PearsonEducation, 2002.
3
FUZZY SETS
Basic Concepts
Eindhoven, the Netherlands 14
Introduction
How to simplify very complex systems?
Allow some degree of uncertainty in theirdescription!
How to deal mathematically with uncertainty?
Using probabilistic theory (stochastic).
Using the theory of fuzzy sets (non-stochastic).Proposed in 1965 by Lotfi Zadeh (Fuzzy Sets,Information Control, 8, pp. 338-353).
Imprecision or vagueness in natural language doesnot imply a loss of accuracy or meaningfulness!
Eindhoven, the Netherlands 15
Classical set
Example: set of old people A = {age | age 70}
A
50 600
70 80 90 100
0.5
1
16
Logic propositions
“Nick is old” ... true or false
Nick’s age:ageNick = 70, A(70) = 1 (true)
ageNick = 69.9, A(69.9) = 0 (false)A
50 600
70 80 90 100
0.5
1
Eindhoven, the Netherlands 17
Fuzzy set
Graded membership, element belongs to a set to acertain degree.
A
50 600
70 80 90 100
0.5
1
mem
bers
hip
grad
e
18
Fuzzy proposition
“Nick is old”... degree of truthageNick = 70, A(70) = 0.5
ageNick = 69.9, A(69.9) = 0.49
ageNick = 90, A(90) = 1A
50 600
70 80 90 100
0.5
1
mem
bers
hip
grad
e
4
Eindhoven, the Netherlands 19
Typical linguistic values
20 400
60 80 100
1
mem
bers
hip
grad
e young middle age old
20
Linguistic variable
x is age= {young, middle age, old}
20 400
60 80 100
1
mem
bers
hip
grad
e
young middle age oldsemantic rules MX
Eindhoven, the Netherlands 21
Fuzzy complement
(x) = 1 – A(x)
1
x
A
Eindhoven, the Netherlands 22
Intersection of fuzzy sets
A B(x) = min( A(x), B(x))
x
BA
Eindhoven, the Netherlands 23
Union of fuzzy sets
A B(x) = max( A(x), B(x))
x
BAFUZZY SYSTEMS
5
Eindhoven, the Netherlands 25
Linguistic variable
{x, , , MX}
Where:x – name of the linguistic variable
– linguistic values (terms)
– Universe of discourse
MX – semantic rule that associates each linguisticvalue to a membership function.
Eindhoven, the Netherlands 26
Fuzzy if-then rules
Fuzzy propositionsx is A, y is B
Linguistic (Mamdani) fuzzy if-then rule:
If x is A then y is BAntecedent: x is A
Consequent: y is B
Rule “If x is A then y is B” is represented by afuzzy relation defined on X Y.
Eindhoven, the Netherlands 27
Examples
If the road is slippery then brake softly.
If error is Negative big and e is Positive big then u isNegative small.If a tomato is red then the tomato is ripe.
If the temperature is very high then reduce the heat alot.
If the valve is closed then the pressure is high.
Eindhoven, the Netherlands 28
Linguistic (Mamdani) model
Decomposing using conjunctive forms:
: is is , 1,2, ,k k kR A B k KIf x then y
1 1 2 2
1 1 2 2
: is is is is is is
k k k kn n
k k kp p
R x A x A x Ay B y B y B
If and and andthen and and and
Degree of fulfillment of antecedents:
1 21 2= ( ) ( ) ( ), 1,2, ,k k k
n
knA A Ax x x k K
Eindhoven, the Netherlands 29
Takagi-Sugeno fuzzy model
Affine linear form:
: is ( ), 1,2, ,k k k kR A y f k KIf x then x
: isTk k k k kR A y a bIf x then x
Degree of fulfillment k defined as in linguistic models
Model output given by the weighted fuzzy-mean:
11
1 1
( )KK k k T kk kkk
K Kj jj j
a byy
x
Eindhoven, the Netherlands 30
Bibliography
G. Klir and T. Folger. Fuzzy Sets Uncertainty and Information.Prentice Hall, 1988.J.-S. Jang, C.-T. Sun and E. Mizutani. Neuro-Fuzzy and SoftComputing: A Computational Approach to Learning andMachine Intelligence. Prentice Hall, New Jersey, 1997.Andries P. Engelbrecht. Computational Intelligence: AnIntroduction. John Wiley, Chichester, 2002.J.M.C. Sousa and U. Kaymak. Fuzzy Decision Making in Modelingand Control. World Scientific Series in Robotics and IntelligentSystems, vol. 27. World Scientific Pub. Co., Singapore, Dec. 2002Michael Negnevitsky. Artificial Intelligence: A Guide toIntelligent Systems. Addison-Wesley, Pearson Education, 2002.R. Babuska. Fuzzy Modeling for Control. Kluwer AcademicPublishers, 1998.
6
FUZZY MODELING
Eindhoven, the Netherlands 32
Kernel-based modeling
Fuzzy systems
Radial basis function networksSupport vector machines
Multi-layer perceptron
...
Fuzzy systems can be interpretable!
Fuzzy sets can close the gap between symbolicprocessing and numerical computations.
Eindhoven, the Netherlands 33
Fuzzy system parameters
Parameters of antecedent membership functions(shape, location, etc.)
Parameters of consequent membership functions(Mamdani systems)
Parameters of consequent functions (Takagi-Sugenosystems)
Aggregation of antecedent membershipsImplication/reasoning
Defuzzification function (Mamdani systems)
Eindhoven, the Netherlands 34
Building fuzzy models
Data-driven approachnonlinear mapping
extract from input-output data:rules
antecedents (membership functions)
consequents (membership or crisp functions)
35
Fuzzy c-means
0
0.5
1
0
0.5
10
0.5
1
XY
MF
0
0.5
1
0
0.5
10
0.5
1
XY
MF
0
0.5
1
0
0.5
10
0.5
1
XY
MF
Assumes partition matrix is fixed
Eindhoven, the Netherlands 36
Modeling based on fuzzy clustering
1. Collect the data
2. Select model structure (Mamdani, Takagi-Sugeno,…)3. Select number of clusters and clustering algorithm
6. Determine a fuzzy rule for each cluster7. Simplify the model, if necessary
8. Validate the model
7
Eindhoven, the Netherlands 37
Building fuzzy models
StructureInput and output variables. For dynamic systems alsothe representation of the dynamics.Number of membership functions per variable, type ofmembership functions, number of rules.
Many applications have hundreds to tens of thousandsof variables/features
Many are irrelevant and/or redundant.
Curse of dimensionality.
20 September 2010 Eindhoven, the Netherlands 43 Eindhoven, the Netherlands 44
Feature selection
What is feature selection?
Remove features (inputs) X(i) to improve (orleast degrade) prediction of outputs Y.
Advantages:Feature selection selects most relevant featuresCollect/process less features and dataLess complex models run fasterModels are easier to understand, verify and explain
Eindhoven, the Netherlands 45
Feature selection algorithms
FiltersBased on general characteristics of data to be evaluated.No model is involved.
WrappersUses model performance to evaluate feature subsets.Train one classification model for each feature subset.
Hybrid methodsDo not retrain the model at every step.Search feature selection space and model parameterspace simultaneously.
Tree search – bottom-up
20 September 2010 Eindhoven, the Netherlands 46
ANT COLONYOPTIMIZATION
Eindhoven, the Netherlands 48
Biologically inspired algorithms
Artificial ant colonies: maybe the most used methodfrom the artificial life algorithms.Introduced by Marco Dorigo (1992), has been wellreceived by academic world and it is starting to beused in industrial applications.Applications: Traveling Salesman Problem, VehicleRouting, Quadratic Assignment Problem, InternetRouting, Logistic Scheduling, clustering and datamining problems.
9
Eindhoven, the Netherlands 49
Ant Colony Optimization
Artificial Life algorithms: swarm, ants, wasps, bees
Ant Colony Optimization is one of the most usedmethod of the Artificial Life algorithms.
ProblemSeptic shock is a common ICU key adverseoutcome, translated into ~50% mortalityrate and high costs of treatments.
Feature Selection (tree search and ant colony optimization)
MethodsFuzzy Systems or Neural Networks
+Feature Selection (tree search and ant colony optimization)
GoalPredict the outcome (survive or
decease) of septic shock patients,for purposes of therapy
management.
Sepsis
Annual mortality rate of sepsis in USA: more than220,000. Sepsis is the tenth most common cause ofdeath.Severe sepsis accounts 2% to 3% of all hospitaladmissions. 59% of patients with sepsis require ICUcare, composing 10.4% of ICU admissions.
The mortality rate for severe sepsis ranges from 13%to 50%, and is as high as 80% to 90% for septic shockand multiple organ dysfunction.
20 September 2010 Eindhoven, the Netherlands 64
Septic shock - background
Management of sepsis is increasingly protocol-driven
Care is goal-directed and parameterizedWith goal-directed therapy, care becomes similar to acontrol problem, with ‘ideal’ process of care revolvingaround:
Setting a goal/target for a specific physiological parameter
Rapidly driving the physiologic process toward specificgoal/target
Maintaining that physiological parameter within upper andlower limits of that goal
20 September 2010 Eindhoven, the Netherlands 65
Septic shock - assumptions
Adequacy of control depends largely on:
Close monitoringEarly detection of change
Active management and intervention by nurses
20 September 2010 Eindhoven, the Netherlands 66
12
MEDAN database
Database used as testbench (Paetza 2003)http://141.2.16.103/datenbank/download_database.htm
VariablesThe MEDAN data base contains the data of 103variables of 387 patientsData from ICU from 1998-2002 collected by medicaldocumentation staffAll patients have septic shock of abdominal cause
TaskPredict patients survival
20 September 2010 Eindhoven, the Netherlands 67
Problems in the database
20 September 2010 Eindhoven, the Netherlands 68
Selection of 387 patients and 59 variables.
Problems in the database
One of the most complete patients.
20 September 2010 Eindhoven, the Netherlands 69
Varia
ble
Time [hours]
Problems in the database
Measurements for a considerable part of the variablesstopped.
20 September 2010 Eindhoven, the Netherlands 70
Varia
ble
Time [hours]
Problems in the database
Long periods with missing data.
20 September 2010 Eindhoven, the Netherlands 71
Varia
ble
Time [hours]
Classification measures
In this example we used the following measures:
Classification accuracy (% of correct classification)Area under the ROC Curve (AUC)
In signal detection theory, a receiver operating characteristic(ROC), or simply ROC curve, is a graphical plot of the sensitivity,vs. false positive ratio (1 specificity).
20 September 2010 Eindhoven, the Netherlands 75
Area under the ROC curve(AUC)
Results
Classification accuracy ACC (%)
20 September 2010 Eindhoven, the Netherlands 76
FSmethod
Model
12 Features set 28 Features set
Num.Feat.
Mean StdNum.Feat.
Mean Std
-NN
[Paetza]12 69.0 4.37 - - -
Treesearch
Fuzzy 2-6 74.1 1.31 2-7 82.3 1.56
NN 2-8 73.2 2.03 4-8 81.2 1.97
AFSFuzzy 2-3 72.8 1.44 3-9 78.6 1.44
NN 2-7 75.7 1.37 5-12 81.9 2.12
Results
Specificity
20 September 2010 Eindhoven, the Netherlands 77
FSmethod
Model12 Features set 28 Features set
Num.Feat.
Mean StdNum.Feat.
Mean Std
-NN
[Paetza]12 92.3 - - - -
Treesearch
Fuzzy 2-6 71.2 2.86 2-7 83.3 2.62
NN 2-8 81.7 3.61 4-8 90.3 2.05
AFSFuzzy 2-3 70.5 0.02 3-9 78.2 0.03
NN 2-7 85.6 0.02 5-12 90.2 0.02
Results
Sensitivity
20 September 2010 Eindhoven, the Netherlands 78
FSmethod
Model12 Features set 28 Features set
Num.Feat.
Mean StdNum.Feat.
Mean Std
-NN
[Paetza]12 15.0 - - - -
Treesearch
Fuzzy 2-6 79.9 2.60 2-7 82.3 1.56
NN 2-8 54.5 5.42 4-8 64.2 3.92
AFSFuzzy 2-3 76.5 0.03 3-9 79.2 0.04
NN 2-7 59.6 0.02 5-12 67.0 0.05
14
Results
AUC
20 September 2010 Eindhoven, the Netherlands 79
FSmethod
Model
12 Features set 28 Features set
Num.Feat.
Mean Std Num.Feat.
Mean Std
-NN
[Paetza]12 - - - - -
Treesearch
Fuzzy 2-6 75.0 1.06 2-7 81.8 1.97
NN 2-8 71.9 1.17 4-8 80.8 1.28
AFSFuzzy 2-3 73.5 0.01 3-9 78.7 0.02
NN 2-7 72.6 0.01 5-12 78.1 0.03
12 features subset
20 September 2010 Eindhoven, the Netherlands 80
0%
20%
40%
60%
80%
100%
1 2 5 6 8 10 14 16 17 24 26 28
Freq
uenc
y
Feature label
BU + FM
AFS + FM
AFS + NN
BU + NN
Most frequent features:8 – pH26 – Calcium28 – Creatinine