Page 1
Decision Trees with Minimal CostsDecision Trees with Minimal Costs(ICML 2004, Banff, Canada)(ICML 2004, Banff, Canada)
Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong KongJianning Wang, Univ of Western Ontario, CanadaShichao Zhang, UTS, Australia
Contact: [email protected]
Page 2
OutlineOutline
IntroductionBuilding Trees with Minimal Total CostsTesting Strategies Experiments and ResultsConclusions
Page 3
Costs in Machine LearningCosts in Machine Learning
Most inductive learning algorithms: minimizing classification errors– Different types of misclassification have
different costs, e.g. FP and FN
In this talk: – Test costs should also be considered– Cost sensitive learning considers a variety of
costs; see survey by Peter Turney (2000)
Page 4
ApplicationsApplications
Medical Practice– Doctors may ask a patient to go through a
number of tests (e.g., Blood tests, X-rays)– Which of these new tests will bring about
higher value?
Biological Experimental Design– When testing a new drug, new tests are costly– which experiments to perform?
Page 5
Previous WorkPrevious WorkMany previous works consider the two types
of cost separately – an obvious oversight(Turney 1995): ICET, uses genetic algorithm
to build trees to minimize the total cost(Zubek and Dieterrich 2002): a Markov
Decision Process (MDP), searches in a state space for optimal policies
(Greiner et al. 2002): PAC learning
Page 6
An Example of Our ProblemAn Example of Our Problem
Training: with ?, cannot obtain valuesIDC1
FeverC2
X-rayC3
Blood_1C4
Blood_2C5
… D
12 101 ? H ? … Yes
23 ? L M L … No
Test: with many ?, may obtain values at a costIDC1
FeverC2
X-rayC3
Blood_1C4
Blood_2C5
… D
45 98 ? ? ? … ?
58 ? ? ? ? … ?
Goal 1: build a tree that minimizes
the total cost
Goal 2: obtain test values at a cost to minimize the total
cost
Page 7
OutlineOutline
IntroductionBuilding Trees with Minimal Total CostsTesting Strategies Experiments and ResultsConclusions
Page 8
Building Trees with Minimal Total CostsBuilding Trees with Minimal Total Costs
Assumption: binary classes, costs: FP and FNGoal: minimize total cost
– Total cost = misclassification cost + test cost
Previous Work– Information Gain as a attribute selection criterion
In this work, need a new attribute selection criterion
Page 9
Attribute Selection Criterion: C4.5Attribute Selection Criterion: C4.5
Minimal total cost (C4.5: minimal entropy)– If growing a tree has a smaller total cost
then choose an attribute with minimal total costelse stop and form a leaf
Page 10
Label leaf according to minimal total costIf (P×FN N×FP)
then class = positiveelse class = negative
Page 11
First, how to handle ? values in training data
Previous work – built ? branch; – problematic
This work– deal with unknown values in the training set:– no branch for ? will be built, – examples are “gathered” inside the internal
nodes
Difference on Difference on ?? values values
Page 12
Desirable PropertiesDesirable Properties
1. Effect of difference between misclassification costs and the test costs
P N P N P P
A1
All test costs are 20
All test costs are 300
P
P P P P
A1
A6 A6
P N P NN N
All test costs are 0
Page 13
2. Prefer attribute with smaller test costs
A1 A2 A3 A4 A5 A6
# 1 20 20 20 20 20 20
# 2 200 20 100 100 200 200
# 3 200 100 100 100 20 200
P P P P
A1
A6 A6
P N P NN N
P N
A2
A1
P N N PP P
P P
A5
A1
P N N PP P
Page 14
3. If test cost increases, attribute tends to be “pushed” down and “falls out” of the tree
Cost of A1=20
P P P P
A1
A6 A6
P N P NN N
Cost of A1=50
P N
A6
A1
N PN P
P
Cost of A1=80
P N
A6
A2
P NN
P
Page 15
OutlineOutline
IntroductionBuilding Trees with Minimal Total CostsTesting Strategies Experiments and ResultsConclusions
Page 16
Missing values in test casesMissing values in test cases
Blood test X-ray result
Urine test S-test
? good ? ?
A New patient arrives:
Page 17
OST: IntuitionOST: Intuition
Explain the intuition of OST here
Page 18
Four Testing StrategiesFour Testing Strategies
First: Optimal Sequential Test (OST)(Simple batch test: do all tests)
Second: No test will be performed, predict with internal node
Third: No test will be performed, predict with weighted sum of subtrees
Fourth: A new tree is built dynamically for each test case using only the known attributes
P P P P
A1
A6 A6
P N P NN N
P N P N P P
A1
Page 19
OutlineOutline
IntroductionBuilding Trees with Minimal Total CostsTesting Strategies Experiments and ResultsConclusions
Page 20
Experiment - settingsExperiment - settings
Five dataset, binary-class60/40 for training/testing, repeat 5 timesUnknown values for training/test examples are
selected randomly by a specific probability Also compare to C4.5 tree, using OST for testing
Page 21
Results with different % of unknownResults with different % of unknown
0
20
40
60
80
100
120
140
160
20 40 60 80
P ercentage of unknown attributes
M1 (OST)
M2
M3
M4
C4.5
No test, internal
C4.5 tree, OST
No test, lazy tree
No test, distributed
OST is best; M4 and C4.5 next; M3 is worst OST not increase with more ?; others do overall
Page 22
0
100
200
300
400
500
600
50 100 200 400
Test costs
M1 (OST)
M2
M3
M4
C4.5
Results with different test costsResults with different test costs
No test, internal
C4.5 tree, OST
No test, lazy tree
No test, distributed
With large test costs, OST = M2 = M3 = M4 C4.5 is much worse (tree building is cost-insensitive)
Page 23
0
100
200
300
400
500
600
50 100 200 400Test costs
M1 (OST)
M2
M3
M4
C4.5
Results with unbalanced class costsResults with unbalanced class costs
No test, internal
C4.5 tree, OST
No test, lazy tree
No test, distributed
With large test costs, OST = M2 = M4 C4.5 is much worse (tree building is cost-insensitive) M3 is worse than M2… (M3 is used in C4.5)
Page 24
Comparing OST/C4.5 cross 6 datasetsComparing OST/C4.5 cross 6 datasets
OST always outperforms C4.5
00.10.20.30.40.50.60.70.80.9
20 40 60 80
(a) P ercentage of unknown attributes
Ecoli Breast Heart Thyroid Australia
0
0.2
0.4
0.6
0.8
1
50 100 200 400
(b) Test costs
Ecoli Breast Heart Thyroid Australia
Page 25
OutlineOutline
IntroductionBuilding Trees with Minimal Total CostsTesting Strategies Experiments and ResultsConclusions
Page 26
ConclusionsConclusions
New tree building algorithm for minimal costs– Desirable properties – Computationally efficient (similar to C4.5)
Test strategies (OST and batch) are very effective
Can solve many real-world diagnosis problems
Page 27
Future WorkFuture Work
More intelligent “Batch Test” methodsConsider cost of additional batch test
– Optimal sequential batch testbatch 1 = (test1, test 2)batch 2 = (test 3, test 4, test 5), …
Other learning algorithms with minimal total cost
A wrapper that works for any “black box”