Decision Trees with Minimal Costs (ICML 2004, Banff, Canada)

Decision Trees with Minimal CostsDecision Trees with Minimal Costs(ICML 2004, Banff, Canada)(ICML 2004, Banff, Canada)

Charles X. Ling, Univ of Western Ontario, Canada Qiang Yang, HK UST, Hong KongJianning Wang, Univ of Western Ontario, CanadaShichao Zhang, UTS, Australia

Contact: [email protected]

OutlineOutline

IntroductionBuilding Trees with Minimal Total CostsTesting Strategies Experiments and ResultsConclusions

Costs in Machine LearningCosts in Machine Learning

Most inductive learning algorithms: minimizing classification errors– Different types of misclassification have

different costs, e.g. FP and FN

In this talk: – Test costs should also be considered– Cost sensitive learning considers a variety of

costs; see survey by Peter Turney (2000)

ApplicationsApplications

Medical Practice– Doctors may ask a patient to go through a

number of tests (e.g., Blood tests, X-rays)– Which of these new tests will bring about

higher value?

Biological Experimental Design– When testing a new drug, new tests are costly– which experiments to perform?

Previous WorkPrevious WorkMany previous works consider the two types

of cost separately – an obvious oversight(Turney 1995): ICET, uses genetic algorithm

to build trees to minimize the total cost(Zubek and Dieterrich 2002): a Markov

Decision Process (MDP), searches in a state space for optimal policies

(Greiner et al. 2002): PAC learning

An Example of Our ProblemAn Example of Our Problem

Training: with ?, cannot obtain valuesIDC1

FeverC2

X-rayC3

Blood_1C4

Blood_2C5

… D

12 101 ? H ? … Yes

23 ? L M L … No

Test: with many ?, may obtain values at a costIDC1

FeverC2

X-rayC3

Blood_1C4

Blood_2C5

… D

45 98 ? ? ? … ?

58 ? ? ? ? … ?

Goal 1: build a tree that minimizes

the total cost

Goal 2: obtain test values at a cost to minimize the total

cost

OutlineOutline


Building Trees with Minimal Total CostsBuilding Trees with Minimal Total Costs

Assumption: binary classes, costs: FP and FNGoal: minimize total cost

– Total cost = misclassification cost + test cost

Previous Work– Information Gain as a attribute selection criterion

In this work, need a new attribute selection criterion

Attribute Selection Criterion: C4.5Attribute Selection Criterion: C4.5

Minimal total cost (C4.5: minimal entropy)– If growing a tree has a smaller total cost

then choose an attribute with minimal total costelse stop and form a leaf

Label leaf according to minimal total costIf (P×FN N×FP)

then class = positiveelse class = negative

First, how to handle ? values in training data

Previous work – built ? branch; – problematic

This work– deal with unknown values in the training set:– no branch for ? will be built, – examples are “gathered” inside the internal

nodes

Difference on Difference on ?? values values

Desirable PropertiesDesirable Properties

1. Effect of difference between misclassification costs and the test costs

P N P N P P

A1

All test costs are 20


P

P P P P

A1

A6 A6

P N P NN N


2. Prefer attribute with smaller test costs

A1 A2 A3 A4 A5 A6

# 1 20 20 20 20 20 20

# 2 200 20 100 100 200 200

# 3 200 100 100 100 20 200

P P P P

A1

A6 A6

P N P NN N

P N

A2

A1

P N N PP P

P P

A5

A1

P N N PP P

3. If test cost increases, attribute tends to be “pushed” down and “falls out” of the tree

Cost of A1=20

P P P P

A1

A6 A6

P N P NN N

Cost of A1=50

P N

A6

A1

N PN P

P

Cost of A1=80

P N

A6

A2

P NN

P

OutlineOutline


Missing values in test casesMissing values in test cases

Blood test X-ray result

Urine test S-test

? good ? ?

A New patient arrives:

OST: IntuitionOST: Intuition

Explain the intuition of OST here

Four Testing StrategiesFour Testing Strategies

First: Optimal Sequential Test (OST)(Simple batch test: do all tests)

Second: No test will be performed, predict with internal node

Third: No test will be performed, predict with weighted sum of subtrees

Fourth: A new tree is built dynamically for each test case using only the known attributes

P P P P

A1

A6 A6

P N P NN N

P N P N P P

A1

OutlineOutline


Experiment - settingsExperiment - settings

Five dataset, binary-class60/40 for training/testing, repeat 5 timesUnknown values for training/test examples are

selected randomly by a specific probability Also compare to C4.5 tree, using OST for testing

Results with different % of unknownResults with different % of unknown

0

20

40

60

80

100

120

140

160

20 40 60 80

P ercentage of unknown attributes

M1 (OST)

M2

M3

M4

C4.5

No test, internal

C4.5 tree, OST

No test, lazy tree

No test, distributed

OST is best; M4 and C4.5 next; M3 is worst OST not increase with more ?; others do overall

0

100

200

300

400

500

600

50 100 200 400

Test costs

M1 (OST)

M2

M3

M4

C4.5

Results with different test costsResults with different test costs

No test, internal

C4.5 tree, OST

No test, lazy tree


With large test costs, OST = M2 = M3 = M4 C4.5 is much worse (tree building is cost-insensitive)

0

100

200

300

400

500

600

50 100 200 400Test costs

M1 (OST)

M2

M3

M4

C4.5

Results with unbalanced class costsResults with unbalanced class costs

No test, internal

C4.5 tree, OST

No test, lazy tree


With large test costs, OST = M2 = M4 C4.5 is much worse (tree building is cost-insensitive) M3 is worse than M2… (M3 is used in C4.5)

Comparing OST/C4.5 cross 6 datasetsComparing OST/C4.5 cross 6 datasets

OST always outperforms C4.5

00.10.20.30.40.50.60.70.80.9

20 40 60 80

(a) P ercentage of unknown attributes

Ecoli Breast Heart Thyroid Australia

0

0.2

0.4

0.6

0.8

1

50 100 200 400

(b) Test costs

Ecoli Breast Heart Thyroid Australia

OutlineOutline


ConclusionsConclusions

New tree building algorithm for minimal costs– Desirable properties – Computationally efficient (similar to C4.5)

Test strategies (OST and batch) are very effective

Can solve many real-world diagnosis problems

Future WorkFuture Work

More intelligent “Batch Test” methodsConsider cost of additional batch test

– Optimal sequential batch testbatch 1 = (test1, test 2)batch 2 = (test 3, test 4, test 5), …

Other learning algorithms with minimal total cost

A wrapper that works for any “black box”

Decision Trees with Minimal Costs (ICML 2004, Banff, Canada)

Documents

total cost total cost

smaller total cost

test cost ctotal cost

minimal total cost c4

types of cost

test cost increases

total costgoal

minimal total costsassumption