Building Cost- sensitive Classifiers TNM033 - Data mining Daniel Eriksson ([email protected] ) Sven Glansberg ([email protected] ) Johan Jörtsö ([email protected] ) Outline • Cost-sensitive classifiers • MetaCost • Other techniques • Applications
Building Cost-
sensitive
ClassifiersTNM033 - Data mining
Daniel Eriksson ([email protected])
Sven Glansberg ([email protected])
Johan Jörtsö ([email protected])
Outline
• Cost-sensitive classifiers
• MetaCost
• Other techniques
• Applications
Cost-Sensitive
classifiers
• Reminder: different measures of quality can be
used. Accuracy, sensitivity, specificity,
precision and recall.
• Another way is to calculate the cost by
defining a cost matrix and using the confusion
matrix.
Evaluating a model
• Cost-sensitive
classifiers• MetaCost • Other techniques • Applications
Repetition: What is
cost?
S
ML
We create a classifier M with algorithm L
from training set S
We want to evaluate the model...
• Cost-sensitive
classifiers• MetaCost • Other techniques • Applications
Repetition: What is
cost?
We test the model M on test set T...
...obtaining a confusion matrix
T
Confusion matrix
M
Model M
Predicted classPredicted classPredicted class
Actual class
+ -
Actual class
+ 150 40Actual class
- 60 250
• Cost-sensitive
classifiers• MetaCost • Other techniques • Applications
Confusion matrix
Model
MPredicted classPredicted classPredicted class
Actual
class
+ -
Actual
class+ 150 40
Actual
class
- 60 250
• Cost-sensitive
classifiers• MetaCost • Other techniques • Applications
Cost matrix
Predicted classPredicted classPredicted class
Actual
class
+ -
Actual
class+ -1 100
Actual
class
- 1 0
C(i,j) =
You (or an
expert) have
to define this!
Application-
dependant
• Cost-sensitive
classifiers• MetaCost • Other techniques • Applications
Total cost
Total cost = -1·150 + 100·40 + 1·60 + 0·250 = 3910
• Cost-sensitive
classifiers• MetaCost • Other techniques • Applications
Total cost
Total cost = -1·150 + 100·40 + 1·60 + 0·250 = 3910
• Cost-sensitive
classifiers• MetaCost • Other techniques • Applications
MetaCost
Some definitions
• S - Training set
• L - Classification
learning algorithm
• M - The model
(classifier) we want
to build
• i,j - Class indices
• x - a record in S
• C(i,j) - Cost matrix
• P(j|x) - (! predicted
confusion matrix)
• R(i|x) - “Expected
cost of predicting
that x belongs to
class i”• Cost-sensitive
classifiers• MetaCost • Other techniques • Applications
P(j,x)
• Cost-sensitive
classifiers• MetaCost • Other techniques • Applications
“Confusion matrix”
Model
MPredicted classPredicted classPredicted class
Actual
class
+ -
Actual
class+ 150 40
Actual
class
- 60 250
P(j|x) !
• Cost-sensitive
classifiers• MetaCost • Other techniques • Applications
R(i,x)
“Expected cost of predicting that x belongs to class i”
• Cost-sensitive
classifiers• MetaCost • Other techniques • Applications
The algorithm –
Parameters
• S - Training set
• L - The classifier
algorithm
• C - Cost matrix
• m - number of
resamples to generate
• n - number of examples
in each
resample (number of
different x)
n " |S|
• p - “Does L produce
class probabilities”
• q - Should all resamples
be used for each
example...
• Cost-sensitive
classifiers• MetaCost • Other techniques • Applications
SS
Training set S
x
• Cost-sensitive
classifiers• MetaCost • Other techniques • Applications
The MetaCost algorithm
1. Create m resamples
Si from S
2. Create m models Mi
from applying
classifier L to Si
3. For each x in S:
1. For each class j
1. Calculate P(j|x)
2. Let class of x be:
4. Let M be the model
produced by
applying L to S
• Cost-sensitive
classifiers• MetaCost • Other techniques • Applications
S
S1
Sm…
M1
L
L
Mm
Si
LMi
1
2
• Cost-sensitive
classifiers• MetaCost • Other techniques • Applications
S1
Sm
…M1
L
L
Mm
LMi
• Cost-sensitive
classifiers• MetaCost • Other techniques • Applications
Si
LMi
3
Relabel class of x
so that:
is minimized
for each x
• Cost-sensitive
classifiers• MetaCost • Other techniques • Applications
¿Q? ¿Qué?
• Cost-sensitive
classifiers• MetaCost • Other techniques • Applications
¿p? ¿Qué?
Maybe deserves some explanation...
Just takes into account if L outputs class or class probabilities. We want: probabilities P(j|x, Mi)
If class:
set P(j|x, Mi) = 1 for that class, 0 for all others
If probabilities:
just take the probabilities as they are
• Cost-sensitive
classifiers• MetaCost • Other techniques • Applications
• Cost-sensitive
classifiers• MetaCost • Other techniques • Applications
• Cost-sensitive
classifiers• MetaCost • Other techniques • Applications
MALA
MBLB
• Cost-sensitive
classifiers• MetaCost • Other techniques • Applications
Classifier L?
• What kind of classifying algorithm?
(does not matter)
• Cost-sensitive
classifiers• MetaCost • Other techniques • Applications
MetaCost
• Is available in WEKA
• Pros:
• Independent of L, (“wrapper
algorithm”)
• Works with multiclass problems
(better than for example stratification)
• Cost-sensitive
classifiers• MetaCost • Other techniques • Applications
MetaCost
• Cons:
• Takes more time to compute
• Accuracy goes down
• Cost-sensitive
classifiers• MetaCost • Other techniques • Applications
Other Techniques
Other Techniques
• Stratification
• Oversampling
• Undersampling
• Decision trees with minimal costs
• Cost-sensitive
classifiers• MetaCost • Other techniques • Applications
Applications
Medicine
• Comparison between C4.5 (J48) and
MetaCost + C4.5 in WEKA on heart-
c.arff data set
• Cost-sensitive
classifiers• MetaCost • Other techniques • Applications
Cost matrix
Predicted classPredicted classPredicted class
Actual
class
+ -
Actual
class+ 0 1
Actual
class
- 4 0
C(i,j) =
• Cost-sensitive
classifiers• MetaCost • Other techniques • Applications
C4.5 Confusion matrix
C4.5 Predicted classPredicted classPredicted class
Actual
class
+ -
Actual
class+ 138 27
Actual
class
- 40 98
• Cost-sensitive
classifiers• MetaCost • Other techniques • Applications
Meta Predicted classPredicted classPredicted class
Actual
class
+ -
Actual
class+ 104 61
Actual
class
- 21 117
MetaCost Confusion
matrix
• Cost-sensitive
classifiers• MetaCost • Other techniques • Applications
Comparison
Meta Predicted classPredicted classPredicted class
Actual
class
+ -
Actual
class+ 104 61
Actual
class
- 21 117
C4.5 Predicted classPredicted classPredicted class
Actual
class
+ -
Actual
class+ 138 27
Actual
class
- 40 98
• Cost-sensitive
classifiers• MetaCost • Other techniques • Applications
Comparison
MetaCost total cost: 145
C4.5 total cost: 187
• Cost-sensitive
classifiers• MetaCost • Other techniques • Applications
Comparison – Cost
0
20
40
60
80
100
120
140
160
180
200
C4.5 MetaCost
145
187
• Cost-sensitive
classifiers• MetaCost • Other techniques • Applications
Comparison – Cost
0 %
10 %
20 %
30 %
40 %
50 %
60 %
70 %
80 %
90 %
100 %
C4.5 MetaCost
77,5 %
100 %
• Cost-sensitive
classifiers• MetaCost • Other techniques • Applications
Comparison – Cost
0 %
13 %
26 %
39 %
52 %
65 %
78 %
91 %
104 %
117 %
130 %
C4.5 MetaCost
100 %
129 %
• Cost-sensitive
classifiers• MetaCost • Other techniques • Applications
Comparison –
Classifications
0 %
10 %
20 %
30 %
40 %
50 %
60 %
70 %
80 %
90 %
100 %
Correct Incorrect
27,1 %
72,9 %
22,1 %
77,9 %
C4.5
MetaCost
• Cost-sensitive
classifiers• MetaCost • Other techniques • Applications
Conclusions
References
[1] Pedro Domingos. Metacost:
A general method for making
classifiers cost-sensitive. In
KDD, pages 155–164, 1999.
[2] Charles X. Ling, Qiang Yang,
Jianning Wang, and Shichao
Zhang. Decision trees with
minimal costs. In ICML ’04:
Proceedings of the twenty-first
international conference on
Machine learning, page 69, New
York, NY, USA, 2004. ACM.
[3] Pang-Ning Tan, Michael
Steinbach, and Vipin Kumar.
Introduction to Data Mining,
(First Edition). Addison-Wesley
Longman Publishing Co., Inc.,
Boston, MA, USA, 2005.
[4] Heart data set. http://
staffwww.itn.liu.se/~aidvi/
courses/06/dm/labs/heart-
c.arff, accessed: 2009-12-02.
?